Vertical to Horizontal Scaling, Part 2: Vertical Scaling @ Stanza


In Part 1, we explored why under explosive growth Stanza switched from vertical to horizontal scaling. In this post we’ll go over background on architecture before the switch to better understand the transition.

Microservice Architecture

From the beginning our backend (MEAN) was built around a microservice architecture. Well-defined, self-contained services allowed us to scale each part of our backend very independently even on a monolithic server. Endpoints rendering different frontend products were kept separate from each other and endpoints for APIs. In addition to scaling benefits, this allowed for easier code management.

Traffic Routing/Load Balancing

Because different services ran on the same physical machine, we used a reverse proxy NGINX to route traffic based on request path.

Each service was scaled by starting additional processes running the same code listening on different ports. NGINX load balanced traffic for a particular service to its processes via upstreams in a round-robin fashion.

# nginx.conf
http {
# round-robin load balancing requests to the
# cloned processes listening on different ports
upstream service_1_upstream {
server localhost:8081;
server localhost:8082;
server localhost:8083;
upstream service_3_upstream {
server localhost:9091;
server localhost:9092;
server {
listen 80;
location /service_1 {
proxy_pass http://service_1_upstream;
location /service_3 {
proxy_pass http://service_3_upstream;


Deployment was handspun and based on sending requests to a custom deploy service on the server. The service would then run scripts to:

  1. fetch latest code from the repository (of the service to deploy)
  2. checkout intended branch
  3. install dependencies
  4. restart cloned proccesses to run new code (we used forever)


Logs for each service were written to files, one per process (eg service_1_8081.log, service_1_8082.log). In our case, we piggybacked forever’s --logFile option.

Since logs were on the same physical machine, simple UNIX commands were sufficient for search.

# searching last 5000 lines of all service_1 processes'
# logs for "Uh oh."
tail -n 5000 service_1*.log | grep --line-buffered "Uh oh."


SSL terminated at NGINX layer before routing to individual processes.

Where do we go from here?

Though we were vertical scaling at server level, underneath the hood each service was horizontally scaled via multiple processes. Luckily for us this model was a good headstart to horizontal scaling as the principles were transferrable. In Part 3, we’ll look at the specific architecture and tools we use under horizontal scaling.

Vertical to Horizontal Scaling, Part 1: Why go Horizontal?


Like most fast-growing startups, Stanza faces a lot of scalability challenges. We sync 250M+ events on a daily basis, handle 10M pageviews a day, and during peak traffic, process 1M+ requests a minute. Part of my work is to keep up with our demanding traffic (we recently grew 100x in just 6 months).

Prior to the 100x explosion, we’ve always run our backend on a monolithic server. Whenever traffic grew, we just threw in more memory and cpu, which ironically, wasn’t so easy.

The Trouble w/Vertical

Grunt Work

As an AWS shop, we are lucky enough not have to physically manage our servers. Even then vertical scaling, at a minimum, required

  1. manually spinning up a new instance
  2. provision the capacity of processes on the new instance
  3. switching traffic

Other times, AMI‘s need to be imaged, disks need to be resized, etc.

One could argue the above work can be automated (which I agree), but there’s a bigger issue…


EC2 servers are charged based on hours of operation, not CPU utilization/memory usage/[insert_any_non_time_measurements_of_work].

Say we chose a m3.medium instance as our server, we are paying for the # of hours that instance is up. Whether those hours were filled with high traffic (great utilization), or low traffic, our bill is the same.

Though traffic at Stanza is growing at a macro scale, traffic on any given day can easily fluctuate by ~10x or more. What this means is if we weren’t utilizing our servers 100% all the time (hint: we don’t), we are still paying for the beefier CPU, memory, and SSD.

Our costs need to look like our utlization. Elasticity FTW!


Any monolithic setup, large or small, suffers from lack of redundancy. When something goes wrong with the sole server, all traffic dies.

A single monolithic server also resides in one location which means requests from places far away will have longer roundtrip times. This usually results in poorer end user experience.

Horizontal to the Rescue

Instead of growing the capacity of a single server, horizontal scaling grows the # of servers, keeping the capacity the same.

Less Grunt Work

By leveraging container technologies such as Docker and AWS provisioning services like Beanstalk we can easily automate the replication, deployment, and traffic switching.

Cost Savings

Because a single server now is of smaller capacity, we can spin down/up servers as we need. Since we are charged based on server hours, our costs will follow traffic patterns (this is more apparent the more granular server capacity is with respect to total traffic).

Better Failover/Consistent Performance

With multiple servers we no longer depend on a single server being healthy. If one fails, we can route traffic to another one and avoid downtime.

In addition, requests can be handled by servers in the same geo-location. No requests will have to travel very far and thus, roundtrip times will be fairly consistent.

Where do we go from here?

Because of grunt work, cost, and lack of redundancy/geo-locality, vertical scaling starts to show its shortcoming after a certain size. In Part 2, we’ll explore technical specifics at Stanza to set up background on our journey away from vertical to horizontal scaling.