CASE STUDY

How Curri autoscales daily on Porter to handle spikes in user traffic

Shankar Radhakrishnan
May 19, 2023
7 min read

Curri is a logistics startup that supercharges supply chain management in the construction space, powering efficiency, allowing for increases in scale, and driving down costs for their clients. They allow sales leaders to reach and delight more customers, ensure smooth and easily scalable operations for location managers, and let logistics leaders drive up their bottom line. Curri accomplishes this through its end-to-end platform, offering everything from an on-demand and nationwide fleet of vehicles to live tracking and notifications to intelligent freight matching. 

The logistics startup was a part of Y Combinator’s S19 batch and raised a Series B round from Bessemer Venture Partners, Initialized Capital, Brick & Mortar Ventures, and Rainfall Ventures. We interviewed Brian Gonzalez, the co-founder and CTO of Curri, and Austin Cherry, Curri’s Infrastructure and Security Manager to see why the startup decided to move to Porter. 

DDoS attacks or user traffic?  

Curri started on Heroku, but the prototypical Platfrom-as-a-Service (PaaS) felt like a starter platform that was used simply for its exceptional convenience. However, this level of convenience comes with some restrictions; once a startup begins to scale, they need greater control over their infrastructure. 

“It felt like the natural progression of a startup–you use Heroku as a toy infrastructure until you hit some scale, then you need to control your destiny.” - Brian Gonzalez, Curri’s co-founder and CTO

A primary concern Curri had with Heroku as they scaled was cost, and the lack of transparency associated with this–a common theme among Heroku users that have begun to scale. The cost of dynos, or containers, on Heroku scales exponentially in comparison to resource usage. The implicit reason for this is that dynos do not allow for granular scaling of resources (in either a manual or automated way); a Performance-M dyno allows for 2.5 GB of RAM usage, so if an infrastructure engineer wanted to increase resource utilization to 3 GB, they would have to upgrade all the way up to a Performance-L dyno (which allows for 14 GB of RAM usage); there is no in between. 

The company realized that to “control their destiny”, as Brian aptly puts it, they needed to move off of Heroku and migrate to their own cloud, to give them greater flexibility and a slew of choices unavailable to them while hosted on Heroku. They wanted to be able to choose their cloud provider and truly own the underlying infrastructure to avoid vendor lock-in. Curri also wanted full visibility into and control over their infrastructure for greater troubleshooting capability and therefore, greater resiliency.

But the real catalyst for Curri to migrate off of Heroku was the daily spike in networking traffic they experienced. Remember the on-demand and nationwide fleet of vehicles I mentioned earlier? Well, every morning of the work week, right around 5 AM, that entire fleet would wake up and go on to the Curri platform, resulting in the location endpoints of tens of thousands of drivers being sent to Curri’s servers over extremely short intervals–almost like a self-imposed DDoS attack! 

This massive traffic spike would overwhelm their infrastructure; when traffic goes up, the amount of resources, specifically compute, required to support the increase in users needs to scale proportionally. This can be accomplished through autoscaling, either by scaling up (adding more resources to an existing container) or by scaling out (adding more containers); vertical and horizontal autoscaling, respectively. Heroku allows for horizontal autoscaling by increasing the number of dynos and manual vertical scaling can be accomplished by upgrading dyno types, but this results in the aforementioned cost hike. Unfortunately, autoscaling wasn’t working as intended on Heroku–in particular, Heroku autoscales based on request response time which isn’t always a great proxy for resource usage, and it certainly wasn’t in this case. In terms of magnitude, Curri’s location endpoint serves around 10 to 13 million requests per day, so as one can imagine, the need to scale fast is quite stringent during these daily spikes.

Curri’s daily traffic; each weekday sees a massive spike in traffic.

They needed to solve this issue, and fast, by autoscaling their servers more granularly based on resource utilization.

Porter to the rescue

By moving to their own AWS account, Curri would be able to own and control their infrastructure, but they still wanted the convenience of a PaaS that provides a layer of abstraction on top; they wanted to take advantage of the benefits of Kubernetes without having their application engineers deal with its complexity. Porter seemed to offer everything they were looking for: you can choose your cloud provider, which instances you want to use, assign granular resources, and have complete agency over how you want to run your servers.

Since Porter runs on Kubernetes under the hood, autoscaling is an inherent benefit of the platform and is based on resource allocation and utilization by default. Due to Curri’s unique traffic pattern of daily spikes at 5 AM, they needed autoscaling that was elastic and responsive enough to quickly expand and contract. Porter’s autoscaling accomplished this and furthermore, no custom logic had to be configured on Curri’s end to get autoscaling up and running–from the Porter dashboard, they simply set target percentage usage for CPU and RAM along with the minimum and maximum number of replicas, and the cluster autoscaler does all the work of adding and removing replicas when necessary.

Furthermore, the battle-tested clusters Porter spins up come with a cluster-wide NGINX reverse proxy that sits behind the load balancer and distributes traffic inside the cluster. This NGINX instance is configured to be highly available so that even if one node goes down, the networking layer stays intact as its pods are dispersed across multiple nodes. The NGINX reverse proxy is also configured to vertically autoscale and gets more RAM assigned as it increases its memory usage. 

Curri was also looking for a quick migration off of Heroku, and Porter made that possible; their migration process took less than a week. But don’t take my word for it:

“Our main goals when moving away from Heroku were to: 1. Reduce cost 2. Increase reliability 3. Have a good scalability story 4. Own the infrastructure to not have vendor lock-in 5. Migrate fully over in under a week. I can reliably say that we have met and exceeded all of our criteria. Also, the Porter team has been one of the fastest, most responsive teams we have ever engaged with.” - Brian Gonzalez, Curri’s co-founder and CTO

Porter handles all the cloud migration work for users that opt for the white-glove migration program. The only friction points that Curri experienced during the migration were due to application-level issues that had to be addressed in their own code base. Although the Curri team did not expect the Porter team to resolve these issues, the Porter team still dug into those application-level issues with Curri’s engineers to ensure that the migration was as frictionless as possible. The extra care and hands-on support Curri got from Porter made the migration stress-free.

All the flexibility of Kubernetes with the convenience of a PaaS

Although application engineers don’t ever have to go under the hood on Porter, there is always the option to do so for engineers that need to touch infrastructure. Austin, Curri’s Infrastructure & Security Manager, previously worked at Cloudflare and is deeply knowledgeable with Kubernetes–whenever he wants to know what exactly is going on under the hood, he has the freedom to dive into the infrastructure and navigate the cluster. Porter is built on top of open-source standards of the Kubernetes ecosystem such as Helm, allowing Austin to directly configure Helm charts if desired. 

Visualization of Porter’s Helm charts that package applications on Kubernetes.

In addition, users can always log into their AWS/GCP console directly if necessary. One example of this relates to compliance; Austin went directly into the company’s AWS console to tag EKS resources during their SOC 2 audit. Curri has full control of the AWS account and can perform any actions they need to under the hood; the Porter platform allows users to configure their infrastructure to suit any compliance requirements, whether that be HIPAA, PCI/DSS, or GDPR since Porter runs applications in their own cloud.

Curri has a team of twenty engineers now, and Austin is the only one who uses this feature. Furthermore, he’s the only engineer who ever goes into their AWS console. The other 95% of the engineering team utilizes the rest of Porter’s dashboard for rollbacks and roll forwards, debugging through logs, and setting up jobs on an occasional basis. 

“The best part about Porter is that our application developers don’t have to learn about Kubernetes at all; Porter has allowed a team with almost no K8s knowledge to successfully use the platform.” – Austin Cherry, Infrastructure and Security Manager at Curri

Essentially, Curri uses Porter to manage their DevOps completely, with just one engineer ever touching infrastructure. Porter does not restrict DevOps-savvy engineers from maximizing the benefits of Kubernetes but abstracts away the details for application-focused developers. Furthermore, Austin doesn’t have to deal with time-consuming and energy-draining Day 2 operations like cluster upgrades, monitoring, and logging, so he can focus on the part of their infrastructure that’s most relevant to Curri. 

All in all, with Porter, Curri migrated off of Heroku to AWS in under a week, reducing their cloud spend and significantly improving their scalability and reliability. They are able to minimize concerns of vendor lock-in as Porter is simply a middleware layer that sits on top of their own cloud provider. And through Kubernetes-powered autoscaling, they are able to handle spiky traffic with zero hassle. 

Next Up

How Curri autoscales daily on Porter to handle spikes in user traffic
Shankar Radhakrishnan
7 min read
How Memberstack uses Porter to serve 30 million requests
Justin Rhee
3 min read
How Porter helps La Haus deliver a world-class developer experience on Kubernetes
Justin Rhee
3 min read
How Nooks uses Porter to achieve a median latency of 250ms for their AI parallel dialer
Shankar Radhakrishnan
11 min read
How Getaround uses Porter to manage Kubernetes clusters serving traffic across 8 countries
Shankar Radhakrishnan
4 min read
How Onclusive uses Porter to consolidate their tech following five mergers
Shankar Radhakrishnan
3 min read
How Dashdive uses Porter to handle a billion requests per day
Shankar Radhakrishnan
5 min read
Why Woflow moved from ECS to Porter
Trevor Shim
6 min read
How Writesonic runs a 1.6TB Kubernetes cluster with no DevOps engineers
Justin Rhee
2 min read
How CommonLit uses Porter to aid the literacy development of over 30 million students
Shankar Radhakrishnan
6 min read
Why Landing chose Porter to scale their servers
Shankar Radhakrishnan
5 min read
Why Carry uses Porter instead of hiring a full-time DevOps engineer
Shankar Radhakrishnan
4 min read
How HomeLight powers billions of dollars of real estate business on Porter
Justin Rhee
3 min read
Govly moves from GCP to AWS in a day using Porter
Shankar Radhakrishnan
5 min read
How Avenue scaled after YC without hiring DevOps
Justin Rhee
3 min read
Subscribe to our weekly newsletter
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.