Home / Case Studies / Sosana
● CASE STUDY · SOCIAL · CONSUMER AI

Sosana: when every feed refresh was a fresh AI bill

LLM CACHING MODEL ROUTING READ REPLICAS

Sosana is an AI-powered social platform that generates a personalised feed for every user. The product worked — engagement was climbing. The problem was that the cost of running it was climbing at exactly the same rate, because every single feed refresh called the model fresh. Growth was making the unit economics worse, not better.

By Yogreet Global Engineering · 9 min read · Updated June 2026
−40%
monthly server cost
flat
cost per active user
3.4×
users on the same infra
71%
feed requests served from cache

A note on the numbers: the figures on this page illustrate the structure and scale of the engagement; exact metrics are shared with Sosana's permission on request.

The situation

A social feed is the worst possible shape for naive AI usage: people pull-to-refresh constantly, often seeing content that hasn't meaningfully changed since the last pull. Sosana treated every one of those refreshes as a brand-new generation — a full model call to rank and personalise the feed, paid for in tokens, every time.

That works fine in a demo. At scale it means your most engaged users — the ones refreshing twenty times a day — are also your most expensive, and cost grows in lockstep with the engagement you're trying to celebrate. The founders could see the AI bill bending upward on the same chart as their DAU, and knew that line couldn't keep going.

The problem we found

We plotted AI spend against active users. Before, the two lines were effectively the same line — cost scaled linearly with engagement. The goal was to bend the cost line flat while the user line kept climbing.

Monthly AI spend as active users grow— Before— After
cost tracks engagement 1:1 cost decoupled from growth 10k DAU 120k 340k DAU

Root causes

1

No caching between refreshes. A refresh seconds after the last one re-ran the full personalisation model, even when the candidate content was identical.

2

Every request hit the most capable model. Simple re-ranks and trivial updates were billed at frontier-model rates, the same as genuinely novel feed generation.

3

Feed reads and write-heavy social actions shared one database. Likes, follows and posts contended with feed assembly for the same connections at peak hours.

4

No notion of "this user's feed hasn't changed." Without a freshness signal, the system had no way to cheaply decide when regeneration was actually warranted.

What we rebuilt

The fix wasn't a cheaper model — it was making sure the model only runs when it has something new to do, and at the right tier when it does.

BEFORE
Feed refresh
Frontier model, every request
full generation, no cache, full token cost
AFTER
Feed refresh
Freshness check + cache
Cache hit
71% · served instantly
Cheap re-rank model
minor updates
Frontier model
genuinely new feed
Read replica for feed assembly + Redis cache

Why each change mattered

Freshness check before generation: a cheap lookup decides whether anything relevant has changed since the last feed. If not, the cached feed is served for a fraction of the cost and latency.

Tiered model routing: trivial re-ranks go to a small, cheap model; the frontier model is reserved for genuinely new feed generation — the same cut-the-bill-without-cutting-quality pattern from our homepage.

Read replica for feed assembly: heavy feed reads stopped competing with likes, follows and posts for the same database connections at peak.

What this means going forward

The important change isn't the 40% — it's that cost per active user is now flat. Sosana can chase the engagement metric every social product wants without the AI bill chasing it back. That's the unit economics a startup needs to actually be safe to scale: growth that makes the business healthier, not more fragile.

Is your cost scaling with engagement?

If your AI bill grows every time usage does, the fix is architectural, not a cheaper plan. A build audit will show you exactly where it's leaking.

Book a build audit →