Vertical to Horizontal Scaling, Part 1: Why go Horizontal?


Like most fast-growing startups, Stanza faces a lot of scalability challenges. We sync 250M+ events on a daily basis, handle 10M pageviews a day, and during peak traffic, process 1M+ requests a minute. Part of my work is to keep up with our demanding traffic (we recently grew 100x in just 6 months).

Prior to the 100x explosion, we’ve always run our backend on a monolithic server. Whenever traffic grew, we just threw in more memory and cpu, which ironically, wasn’t so easy.

The Trouble w/Vertical

Grunt Work

As an AWS shop, we are lucky enough not have to physically manage our servers. Even then vertical scaling, at a minimum, required

  1. manually spinning up a new instance
  2. provision the capacity of processes on the new instance
  3. switching traffic

Other times, AMI‘s need to be imaged, disks need to be resized, etc.

One could argue the above work can be automated (which I agree), but there’s a bigger issue…


EC2 servers are charged based on hours of operation, not CPU utilization/memory usage/[insert_any_non_time_measurements_of_work].

Say we chose a m3.medium instance as our server, we are paying for the # of hours that instance is up. Whether those hours were filled with high traffic (great utilization), or low traffic, our bill is the same.

Though traffic at Stanza is growing at a macro scale, traffic on any given day can easily fluctuate by ~10x or more. What this means is if we weren’t utilizing our servers 100% all the time (hint: we don’t), we are still paying for the beefier CPU, memory, and SSD.

Our costs need to look like our utlization. Elasticity FTW!


Any monolithic setup, large or small, suffers from lack of redundancy. When something goes wrong with the sole server, all traffic dies.

A single monolithic server also resides in one location which means requests from places far away will have longer roundtrip times. This usually results in poorer end user experience.

Horizontal to the Rescue

Instead of growing the capacity of a single server, horizontal scaling grows the # of servers, keeping the capacity the same.

Less Grunt Work

By leveraging container technologies such as Docker and AWS provisioning services like Beanstalk we can easily automate the replication, deployment, and traffic switching.

Cost Savings

Because a single server now is of smaller capacity, we can spin down/up servers as we need. Since we are charged based on server hours, our costs will follow traffic patterns (this is more apparent the more granular server capacity is with respect to total traffic).

Better Failover/Consistent Performance

With multiple servers we no longer depend on a single server being healthy. If one fails, we can route traffic to another one and avoid downtime.

In addition, requests can be handled by servers in the same geo-location. No requests will have to travel very far and thus, roundtrip times will be fairly consistent.

Where do we go from here?

Because of grunt work, cost, and lack of redundancy/geo-locality, vertical scaling starts to show its shortcoming after a certain size. In Part 2, we’ll explore technical specifics at Stanza to set up background on our journey away from vertical to horizontal scaling.