Blog

Insights from the
engineering trenches.

Practical writing about deploying LLMs, building RAG pipelines, and making AI work in production. No hype, just lessons learned.

May 14, 2026·Daniil Matkov

The Speedup Is in the Plumbing: Reading Gemma 4's MTP Drafters Carefully

Google's Multi-Token Prediction drafters promise up to 3x faster Gemma 4 inference. The real story is in the engineering — and where the headline number quietly breaks down.

LLMInferenceSpeculative DecodingGemma

April 13, 2026·Daniil Matkov

The Quality-Detectability Tradeoff: Why Worse Models Evade AI Detectors

AI detectors are trained on frontier model outputs. This creates a paradox: the better your model, the more detectable your text. Here's how degraded generation exploits that gap.

AI DetectionFine-tuningLLM

Insights from theengineering trenches.

The Speedup Is in the Plumbing: Reading Gemma 4's MTP Drafters Carefully

The Quality-Detectability Tradeoff: Why Worse Models Evade AI Detectors

Insights from the
engineering trenches.