Cyber Monday is like the Super Bowl of shopping for e-commerce sites. As such, it’s usually not the day to roll out changes to website code. After all, a golden four-hour window during Cyber Monday could make or break or sales.
Etsy takes a different tack. For the 14-year-old marketplace, the biggest online shopping holiday of the year–when Etsy sees double the sales and search activity that it does on a normal day–is not off limits for code tweaks. Etsy continuously deploys code onto the site, sometimes as many as 30 times a day, says chief technical officer Mike Fisher. Continual deployment, he argues, helps keep the staff in the rhythm of making fixes quickly too. “I think that’s the best way to keep things stable.”
Cyber Monday 2018 put this strategy to the test. On that day, Etsy’s 2.6 million sellers pulled in an average of around $19,000 in gross merchandise sales per minute. In the middle of it all, a tool the company had released in the days leading up to the shopping holiday malfunctioned. Sellers were supposed to have been able to add their items to a sitewide Cyber Monday sale via their online dashboard; the problem was, the tool didn’t recognize some time zones properly. That meant that for some of the sellers east of Etsy’s Brooklyn headquarters, their sales automatically closed long before Cyber Monday ended.
The company says it communicated the issue to sellers as soon as it figured out what had happened, and about 60 percent of sellers were able to re-start their sales. To make up for the mistake, Etsy offered advertising and listing credits to the sellers affected.
While the glitch hasn’t deterred the company from rolling out code changes around the big day, it has helped inform Etsy’s plan of action to prevent possible site issues this year. To prepare for the surge of activity–the site saw 140 product searches per second last year–the 400-person engineering team is making some changes. They’re scaling up their servers and staff will work extra shifts. The team will also go into what they call code “slush” mode, during which they don’t execute any major changes but they do continue to push out short lines of code to continuously improve the site’s functioning. Fisher notes that many organizations instead prefer a code “freeze,” where sites are essentially untouchable during peak times.
Beyond the slush, Fisher and all of his engineers host a pre-mortem meeting before the week to brainstorm any potential problems before they arise as well as potential solutions. When imagining potential issues, Fisher says they conceptualize a tree of scenarios, where a root issue may affect many other branching issues. From there, they make a plan for every issue on the tree. For example, in the case of, say, a checkout malfunction, Etsy has a response and action plan for everything from a small cart malfunction to a large-scale event in which the leadership team would need to get involved with a public response.
He also prepares the team with worksheets that have action items and incident response plans for situations brainstormed in the pre-mortem. This includes having a plan for communicating issues internally and externally–you need to know how to explain what happened to customers and sellers, too, Fisher suggests.
And of course, it’s never too early to prepare. “In fact, we’ve already started planning for next year,” Fisher says.
Correction: A previous version of this story incorrectly stated that Etsy has made predictions for this year’s Cyber Monday sales. The engineering team created this year’s plan based on last year’s site performance. Last year Etsy saw around $19,000 in sales per minute and 140 product searches per second. The article had also misstated Etsy’s checkout per second rate. The company declined to disclose the correct figure.