Daily engineering insights, practical guides on architecture and AI cost, plus five real product stories — the infrastructure decisions made early, or made too late, that shaped what they became. Same lens we use on every Yogreet build: what would this have cost to get right from day one?
Learn how to route non-urgent AI tasks to Batch API, reducing costs by ~50% while maintaining user experience.
Explore prompt caching versus fine-tuning for LLM cost reduction in startups.
Learn how to effectively decide which requests escalate to frontier models in AI systems, optimizing performance and cost.
Explore semantic caching for LLM apps to cut costs by 70%, while understanding potential accuracy pitfalls.
The four levers that actually move an AI bill — caching, routing, batching, output discipline — ranked by impact, with the quality trade-offs spelled out.
Not a religious war — a staging decision. When a modular monolith wins, the three signals that justify a split, and how to migrate without a rewrite.
The real cost drivers — tokens, infrastructure, data — why per-user cost creeps up, and how to keep it flat from 100 to 100,000 users.
The four causes of the expensive rewrite that lands right when growth works — and how designing clean seams early avoids it entirely.
How a tiny team avoided a rewrite by designing their database to be shard-friendly before they ever needed to shard it.
WhatsApp didn't out-hire its way to scale. It out-architected everyone else's headcount with one unfashionable language choice.
A single database corruption in 2008 triggered a seven-year, full-stack rebuild — the most expensive "re-architecture spike" in tech history.
The product Tiny Speck spent years building wasn't the one that mattered. The internal tool built "just to get by" was.
The one story on this list where nobody had to learn the lesson the expensive way — because the architecture was the product.