Tiny AIs vs. Server Logs: Who's Ready for Real Time?

Kari Jaaskelainen

13 Jan 2026 — 1 min read

System logs are messy and massive. This study tests whether small language models (SLMs) can reliably read them fast. Instead of treating "severity" labels as the end goal, the authors use severity classification as a probe of real runtime log understanding.

Best performer: Qwen3-4B hit 95.64% accuracy with retrieval (RAG).
Tiny but mighty: Qwen3-0.6B reached 88.12% with RAG despite weak performance without retrieval.
RAG is not magic: Gemma3-1B jumped from 20.25% to 85.28%, while some reasoning models got worse (Qwen3-1.7B, DeepSeek-R1-Distill-Qwen-1.5B).
Speed matters: Most Gemma/Llama variants ran under 1.2 s per log; Phi-4-Mini-Reasoning took over 228 s with under 10% accuracy.

Bottom line: architecture, training, and the ability to use retrieved context under tight output rules decide who wins. For AIOps, RCA, and digital twins, small deployable models look promising for real-time use. Read the paper: https://arxiv.org/abs/2601.07790v1

Paper: https://arxiv.org/abs/2601.07790v1

Register: https://www.AiFeta.com

AI AIOps MLOps Observability DevOps LLM RAG EdgeAI Benchmarking DigitalTwin Logs

Tiny AIs vs. Server Logs: Who's Ready for Real Time?

Kari Jaaskelainen

Read more

Tekoäly myötäilee toteamuksia enemmän kuin kysymyksiä

Tekoälyn pitäisi uskaltaa sanoa “en tiedä” — ja sillä on väliä, miten tämä mitataan

Pienet kielimallit nopeutuvat, kun niille opetetaan valmiita fraaseja

Kone näkee saman kohtauksen eri tavoin – uusi tapa opettaa sen kokoamaan aistinsa yhteen