Free tool

How many servers do you need?

Size your fleet for peak load. Enter your peak traffic and per-request latency and get the number of instances you need — with headroom so you're not running at the redline.

💡 Every value is editable. Defaults are illustrative. Use your own measured peak RPS, average request time and per-instance concurrency for a number you can plan around.

Your peak load

All values are illustrative — replace them with your own measured numbers from load tests or production metrics.

Instances needed
0
0req/s / instance
0req/s total safe
0%headroom
max 0 req/s/instance × 70% = 0 safe

Mind the headroom. At 100% utilization with no headroom you have no room for traffic spikes, rolling deploys, or instance failure — target 60–75% so a bad minute doesn't become an outage.

Little's Law, in one line

An instance that runs W requests concurrently, each taking L seconds, sustains about W / L requests per second. So capacity = workers / latency. With 50 workers at 200 ms (0.2 s), that's 50 / 0.2 = 250 req/s per instance at full tilt — then you apply your utilization target for safe capacity.

Assumes requests are CPU/IO-bound and evenly distributed. Real systems have queueing, GC pauses, and uneven load — load-test to confirm.

Capacity is a design decision, not a guess.

We profile your real latency distribution, find the bottlenecks, and right-size the fleet so you carry headroom without overpaying for idle instances. Book a free build audit.

Book a Build Audit
How it works

Capacity planning, the simple way

The question "how many servers do I need" has a surprisingly clean answer once you know three numbers: peak load, per-request latency, and concurrency per instance. This capacity planning calculator turns them into a fleet size using Little's Law — one instance sustains its worker count divided by average request time, and you divide peak traffic by that safe number to get instances for requests per second. Round up, and you have a defensible answer for scaling servers before peak hits.

The piece teams skip is headroom. Running a fleet at 100% utilization looks efficient on a spreadsheet and fails the first time traffic spikes, a deploy rolls, or an instance dies. Targeting 60–75% costs a little more steady-state and saves you from the outage — which is why a good server capacity calculator bakes utilization into the math rather than leaving it as an afterthought.

InputWhat it meansTypical
Peak RPSRequests/sec at your busiest minuteMeasured
LatencyAverage request processing time50–500 ms
Workers / instanceConcurrent threads or connections10–200
Target utilizationHeadroom for spikes & failures60–75%
FAQ

Questions about capacity planning

How many servers do I need for X requests per second?

Divide your peak requests per second by what one instance can safely handle. One instance handles roughly its worker/thread count divided by the average request time in seconds (Little's Law) — for example 50 workers at 200ms each is about 250 req/s. Then add headroom so you're not at 100% utilization, and round up.

What utilization should I target?

Aim for 60–75% at peak. Running closer to 100% leaves no room for traffic spikes, rolling deploys, garbage-collection pauses, or a failed instance — all of which happen routinely. The headroom is cheaper than an outage.

What is Little's Law for capacity planning?

Little's Law says the number of in-flight requests equals throughput times latency. Rearranged for capacity: an instance that can run W requests concurrently, each taking L seconds, sustains about W/L requests per second. It's the simplest reliable way to turn latency and concurrency into a server count.

Related