Case study · HealthTech · Consumer AI

Calcounti AI: from meal-time timeouts to 99.9% uptime

Israel market Microservices migration AI cost & cache layer Autoscaling

By Yogreet Global Engineering11 min readUpdated June 2026

Calcounti AI is an AI-powered nutrition and calorie-tracking app with real daily-active usage in the Israeli market. Like most consumer apps, its traffic isn't smooth — it's concentrated into sharp bursts, right around breakfast, lunch, and dinner, exactly when people open the app to log a meal. That pattern is normal. What wasn't normal was how the backend behaved under it.

2,800ms→240msp95 API response time

96%→52%peak server CPU

~8%→<0.1%sessions hitting timeouts

92%→99.9%uptime during meal hours

A note on the numbers: the figures and chart shapes on this page are illustrative placeholders that represent the structure and scale of the engagement. They'll be replaced with Calcounti AI's verified metrics once finalized.

The situation

The app's usage curve is the most predictable thing about it: a quiet overnight baseline, then three sharp spikes a day as people log breakfast, lunch, and dinner. Each spike means a wave of near-simultaneous requests — photo uploads, AI food recognition calls, nutrition lookups — arriving in the same few minutes, three times a day, every day.

The original backend wasn't built around that shape. It was built as a single application process handling everything — authentication, the AI food-recognition pipeline, analytics, and push notifications — talking to one database for every kind of read and write. That's a completely reasonable way to start. It's also exactly the kind of architecture that meal-time traffic finds and breaks first.

The problem we found

We measured the app under simulated load that mirrored its real meal-time traffic pattern, and the two charts below are what that revealed.

API response time as concurrent users increase

BeforeAfter

Server CPU load across a typical day

BeforeAfter

Both charts point at the same underlying issue from different angles: the system had no mechanism for handling its own predictable peaks. It was sized and built for the average minute, not the three minutes a day that actually mattered to users.

Root causes

Every meal log triggered a fresh AI model call. Even for common, frequently logged foods, there was no caching layer — the system paid full inference cost and latency on every single request.

One process did everything. Authentication, the AI pipeline, analytics, and notifications all ran inside the same application, so a slowdown in one dragged down all of them at once.

One database served every kind of query. Real-time meal logging and heavier analytics reporting competed for the same database connections, at the exact moments load was highest.

Fixed server capacity, sized for the average minute. There was no autoscaling, so the server pool that comfortably handled the overnight baseline was the same pool straining at breakfast, lunch, and dinner.

Uncompressed photo uploads. Meal photos were uploaded at full size before processing, adding avoidable latency on mobile networks — worst exactly when the network was already busy with everyone else's meal photos.

What we rebuilt

None of these were fixed by "adding more servers." Each one needed a specific architectural change — most of which separated something that had been bundled together for no good reason.

Before

After

Why each change mattered

Food recognition + cache, as its own service: common meals get served from cache near-instantly; the AI model is only called for genuinely new or ambiguous entries — the same cache-and-route pattern behind every cost curve on our homepage, applied here for speed instead of just spend.
Auth and notifications, split out: a slow analytics query can no longer make login or push notifications slow down with it.
Analytics on its own read replica: heavy reporting queries stopped competing with real-time meal logging for the same database connections.
Autoscaling on the food-recognition service specifically: capacity now expands automatically right before breakfast, lunch, and dinner — and contracts overnight, instead of running fixed capacity sized for the worst case, all day, every day.
Image compression before upload: meal photos are resized and compressed client-side before they ever hit the network, cutting upload time on exactly the mobile connections that are busiest at meal times.

What this means going forward

The architecture didn't just fix the symptom — slow responses and occasional crashes — it changed what kind of growth the app can absorb without a rebuild. The food-recognition service can now scale independently of everything else, which matters because it's the one piece of the system whose load grows fastest as the user base grows. That's the same 100-to-100,000-user framing behind every Yogreet build: the parts of the system that scale fastest should be the parts that can scale on their own.

Recognize this pattern in your own app?

Predictable usage spikes — meal times, paydays, market open — are exactly what a build audit is for. We'll map where yours will break first.

Book a Build Audit