How We Helped a Magento Store Scale to New Heights

During the COVID-19 Pandemic in late March, one of our Magento 2 Commerce clients had some massive traffic and sales. In partnership with Creatuity, the development agency for the client, we worked to scale up their site to handle this surge in visitors…

This blog post will look at the steps we took on the hosting end to meet this demand.

Identifying Bottlenecks

Visitors to the client’s website went through the roof from news exposure and marketing. From one night to the next, concentrated visits increased 5 to 10 times, which was an issue.

The hosting cluster set up for their store was designed to handle large amounts of traffic and orders. Scaling up php nodes would allow for increased traffic. But the surge in traffic all at once when commercials aired nationwide, combined with the media attention, and everyone at home watching TV was quite the challenge. We found that bottlenecks were quickly reached.

Here is what the New Relic graph looked like when these traffic spikes occurred initially:

(click image to enlarge)

(click image to enlarge)

As you can see, performance dropped precipitously. We quickly identified that Redis was the first bottleneck we had to overcome (see the next section for specific fixes we implemented). After Redis, we found that a single MySQL database for everything in Magento was the next hurdle. In between Redis and MySQL were also many other tweaks and improvements that were made.

Implementing Structural Changes

Redis:

We quickly discovered that Redis uses A LOT of data transfer under heavy traffic. In fact, it completely saturated the 1 Gbps private network card (which led to php and website actions backing up quickly). We had to split Redis off to its own server (away from MySQL), and more importantly, upgrade the private network to a 10 Gbps system.

We did this quickly, and Redis was now bursting in the 1-2.5 Gbps range comfortably at peak traffic. No more slowdowns due to Redis reaching network saturation.

MySQL:

Once Redis was running optimally, we found that when traffic spiked all at once, MySQL was having a hard time processing regular Ajax web requests, customer logins, add to cart actions, and checkouts all at the same time. It was quickly determined that splitting the database into three databases (Main, Cart, and Checkout) with Magento Commerce would pay big dividends.

We worked with Creatuity to migrate the code to Commerce and split out the databases. Once that was done, MySQL could comfortably scale with high levels of traffic and orders without getting bogged down during peak traffic. It also now supports replicated databases for read-only access to further increase throughput.

Misc. Changes:

In addition to the big changes with Redis and MySQL, we worked on improving performance in a number of areas:

  • Tweaking network parameters (TCP/IP level) to improve throughput and private network performance (bare metal servers and network is a distinct advantage in this area)
  • Memory adjustments to MySQL, Redis, etc… to make more efficient use of in-memory utilization
  • PHP-FPM configuration changes to perform faster at scale
  • Load Balancer adjustments to even out performance across the web nodes
  • Better use of the CDN for static assets to offload NFS and load balancer utilization

The End Result:

After all of these changes were made in a short period of time, we were able to easily handle over 12,000 simultaneous visitors on the site. The infrastructure was able to support a peak order volume of 5 to 6 orders per second! The New Relic graph now shows a system that is working as it should:

(click image to enlarge)

(click image to enlarge)

 

Monitoring is Key

All of these changes would not have mattered if we didn’t have an effective monitoring system in place for each component in the cluster. Being able to detect an issue before it balloons into a major problem helps keep things running smoothly. Things such as:

  • NFS monitoring for speed, performance, and errors
  • TCP/IP monitoring for packet issues and saturation problems
  • Load Balancer monitoring looking for errors and saturation conditions
  • Web nodes being monitored for performance, error free, and serving content quickly
  • Continuous monitoring of add to cart and checkout performance to pinpoint a drop in speed

Not only are things such as this monitored, but pro-active action is taken if any “tripwires” exceed a defined threshold. This has been the key to keeping things running smoothly for the hosting cluster.

It’s a good problem to have when you’re scaling up to handle tens of thousands of shoppers in a short period of time. Both LexiConn and Creatuity rose to the challenge and helped this Magento Commerce deal with unprecedented demand for their products.

If you are looking for a web host that can help your Magento store scale, get the most out of the software, and to have a hosting partner that can help you sleep well at night, don’t hesitate to reach out to us.  :)

Looking for a web host that understands ecommerce and business hosting?
Check us out today!

Leave a Reply