The holiday rush isn’t just stressful for shoppers – it’s make-or-break for retailers. In this episode of PLATO Panel Talks, host Mike Hrycyk sits down with QA experts Susan Finley and Samta Kaura (PLATO) to unpack how teams prepare for holiday and Black Friday testing. 

They dive into the real risks of downtime, the importance of performance and functional testing, and why code freezes never quite hold. From managing last-minute patches to coordinating with third-party systems, the panel shares lessons learned from years in retail QA and offers tips for keeping sites stable when traffic and tension are at their peak. 

Whether you’re in e-commerce, logistics, or QA leadership, this episode reveals how a smart testing strategy can turn chaos into confidence during the busiest season of the year. 

 

Can’t use the player?

Listen to this episode on Spotify (opens in new tab)

Episode Transcript:

Mike Hrycyk (00:00):

Hello everyone. Welcome to another episode of PLATO Panel Talks. I’m your host, Mike Hrycyk. So, when I say testing for the holidays, most retail web presences out there have special rules, special tricks, special ways of handling retail testing, preparing for the holiday season. When we talk about the holiday season here, we’re mostly talking about Black Friday through Christmas. And this is interesting because a lot of us will work in retail, but one of the things we’re going to find out is that maybe the holiday testing isn’t restricted to retail. And so, to introduce my panel, I’ve invited two people with lots of experience in retail testing, and we’ll start with Susan.

Susan Finley (00:39):

Sure. Susan Finley. I’ve been working in quality assurance testing for over 15 years. I am currently working with a consultancy also, I’ll be speaking at Star Canada this October. I’m doing a keynote on why you didn’t actually hire me to fix quality. It’s just sort of my little story about my own career and how I’ve often been recruited and hired with organizations saying we need to fix testing. And I generally join and find out that the issues don’t really have a lot to do with testing. They tend to be systemic in terms of how they actually approach developing and releasing software.

(01:13):

Often, yes, we can improve the way we test for sure, and there’s lots of creativity we can bring to the table, but there also are a lot of systemic things that impact the quality that software testing can absolutely surface, but it’s usually a broader scale that I look at when encompassing looking at quality and improving testing.

Mike Hrycyk (01:31):

Great. Thanks for being here today, Susan. Samta, tell us about yourself.

Samta Kaura (01:35):

Hi, my name is Samta Kaura, and I have around 15 years of experience in quality. I have worked with a lot of retail clients like Hudson’s Bay and Cymax Group. I worked for them in their quality department to help set the quality goals for their websites.

Mike Hrycyk (01:52):

Perfect. So, normally I just sort of jump right into some questions, but I knew that there was a lot of stats out there in this stuff and I want to scare you first so that you listen more intently. So, the worldwide Black Friday spend 2024 was 74.4 billion, which was up 5% from the last year. That’s 74.4 billion in one day. Globally, 69% of Black Friday e-commerce came from mobile. That’s practically three-quarters of the spend is going into mobile. About 24 and a half percent of merchants had downtime exceeding five minutes during Black Friday and Cyber Monday in 2019, and 6.6% of storefronts crashed for more than 30 minutes. And that sounds bad, but let’s think about how bad that sounds.

(02:45):

Some of the projections, so this was for 2024, suggest that the downtime during high traffic periods like Black Friday could cost retailers about $540,000 US per hour for a medium to a large site or $9,000. And think about the last outage, the last problem that you ever encountered with somewhere that you worked. How many minutes did it take to get back up? Was it less than a minute? I am pretty sure that after that first minute of $9,000, the president is sitting there going, Are we up yet? Are we up yet? Are we up yet?

(03:19):

Even short downtimes, five to 10 minutes can happen to well-known retailers during Black Friday and these can correlate with noticeable sales losses. When you couple all this to the fact that, so the big sites have big numbers for sure, but when you think about the smaller businesses, they still live in this area that the spend between Black Friday and Christmas, that’s the shopping period of the year that gives you your profit, right? Everything holds your doors open and that money spent up until Christmas is how you can buy Christmas. So, it’s a really important time. So, all of that said, now that you’re all terrified and probably makes this question a little more easy, but simple, obvious question in retail, why is the holiday season so important? Let’s start with you, Susan.

Susan Finley (04:05):

Well, I think it’s exactly what you said. It is the difference between an organization making money or not for the year, right? It is what puts them in the black. And if you look at it like it’s 40% or more of their actual annual revenue. Being down within that time can mean the difference of actually making any money for the year. It’s incredibly stressful, and it’s not even just that it’s one day; it’s like several months of having to keep that system stable.

Mike Hrycyk (04:33):

Samta, anything to add?

Samta Kaura (04:34):

Yeah, there are traditional emotions of the customer linked to these days because it’s a festival season. They want to buy stuff for the family, they want to get gifts, they want to decorate their house during those seasons. So, that is another reason I would say that retail is important during those days of the holiday season. The customers and their emotional, traditional values also add on.

Mike Hrycyk (04:59):

I sort of said, but we’ll just make sure that it’s out there. Samta, what is the holiday season? When does it start? When does it end?

Samta Kaura (05:07):

So, it starts generally early November. Nowadays, pre-Black Friday sale starts early November, and it goes till Christmas and beyond Christmas. It’s like Boxing Day. And some retailers are going black Friday month or Boxing Day month. So, for the last end period of the year, November to December, I would say is when there are a lot of customers in the market. Again, I said it’s all traditional and emotionally connected because of all the festival seasons coming during those months. So, I would say early November to early January.

Mike Hrycyk (05:43):

I hadn’t thought about that. Yeah, we are seeing a lot of businesses that don’t want to be just one of the other people in a Black Friday sale. So, then the period becomes more important to them.

Susan Finley (05:52):

I would add the pre-sales and the post-sales, right? So, they’re also doing sales in January, still trying to drive through some savings to customers. Lots of returns are happening through to January, too. So, it extends out, and I think Samta’s point about the customer is key. It’s where you’re going to build your customer trust, right? If you break their trust when they’re depending on you the most because they’re looking for those gifts to come and be on time, it’s a critical point in any customer relationship for sure.

Mike Hrycyk (06:19):

And I think it’s important to point out that it’s not just Judeo-Christian holidays, it’s not just Christmas. I can think of a half dozen other religions that have a festival in that period. It seems that the end of this year, the end of the calendar year, is grouped with lots of festivities. It’s not just gift giving, it’s also buying your decorations.

(06:40):

So, one of the things that gets talked about a lot in testing for the holidays is performance, but is that all we’re talking about here? Is that the only important thing for testers around the holidays? Susan?

Susan Finley (06:53):

It definitely is critical. Performance is critical for just the number of shoppers that are being driven towards your site, as that volume increases. But I would say that the functional accuracy, especially in things like checkout, payment processing, and promotions, is probably, if not more important, just as important. It’s the areas that will have the biggest gains and impact to the monetary impact of the business. So, functional accuracy in those areas, I think, is just as important.

Samta Kaura (07:26):

Yeah, definitely, functional testing is important and performance. Security testing is also one of the components which is critical during those days. We don’t want customers – I am pretty sure that all the hackers have their eye on those days because they know there will be a lot of people doing transactions, putting their credit cards on, putting their PII [Personally Identifiable Information] on, so security would also be one of the important aspects. Other than that, like you said, there were 40% checkout done by mobile. Mobile testing is also one of the key components when we will be in this season.

Mike Hrycyk (08:05):

When we talk about functional, so there’s about user experience, can you do what you want to do in a way that makes you happy? But there’s even just base functionality. People are incredibly resilient to really wanting to get done what they want. They want to buy this thing, they’ll try and bully their way through a bug to get there and I love consumers for that. But there’s also the fact that, well, what happens if you introduce a bug when you put your stuff on your card, it adds an extra three of them, or it decides your promise date should be next June and doesn’t display it properly. All sorts of different functions. You need that stuff to work, and not just that you need it to actually work. Can a person go in and put stuff in their cart, and it’s not just as performance? If someone can’t do it well enough, they’ll be upset, but if they can’t do it at all, then all that money we talked about that’s on the table, it’s not coming to you.

(08:52):

So, I’m going to juggle questions a little just to confuse you. If uptime and a bug-free experience are so important, why does it seem like there’s such a push to get new features in right before the holiday? Everything we’ve said makes it sound like, well, that’s just a kind of a dumb idea. Samta, why is it?

Samta Kaura (09:09):

That’s the season when there are a lot of marketing campaigns going on. There are a lot of codes on the site available. So, it is a critical period for a company, but they want their best processes out. So, that’s why there is a push of putting those features and putting those marketing campaigns on time on the websites so that they can tell the customers what is going on with them. So, I think that’s one of the points where there is a push for putting those new features on.

Mike Hrycyk (09:37):

Alright, Susan, anything to add to that?

Susan Finley (09:39):

I mean, I think that it’s generally revenue pressure. I think Samta’s hitting it on the nose. I do think that retailers are constantly looking at how can they reach more customers, how can they increase sales? And so, the push right before their biggest sale season is to get new payment options in or something else that would personalize that experience for the customer, that would make them want to shop or buy something else. So, they’re always looking at what the different ways are that we can drive customers to purchase more, and I think that season coming up is where they start to push that we really want these new features out. It’s probably one of the most challenging things for us to push back on sometimes because you are making the case of save the base versus increase the sale, and it’s a definite balancing act. It’s all revenue. It’s a pressure point, for sure

Mike Hrycyk (10:33):

I don’t often feel in these podcasts that I have to push my own bonafides, but I worked for a company for five years as their director of QA. And they were retail-focused. They were online, and they did photo sites. Part of our customers were Costco, Walmart, and a bunch of others like that. And the really big term that doesn’t get pushed around a lot – well, no, it does. It’s the word differentiate. You need to differentiate yourself from your competition. Everyone is competing with everyone else. And so, whether it’s a new product, whether it’s a new look, whether it’s things on your site just work better, or they can get it to you faster, or your promise times are actually believable. It’s not – I mean, that’s another point, it’s not always that your stuff promises to get to you fastest, which is important in the holidays. You want to have a chance to wrap it and see it. It’s that you can have faith in that promise. So that can be a differentiator, right? It’s being stable can be a differentiator and so those differentiators are really important. You absolutely said it. It’s a balancing act. Because you need the new features, you need the new products, you need the stability to make sure that people will come. But then it does have to be stable, and it does have to work. So, you need that stability. You need to test for that stability. One of the biggest ways that they push for and guarantee – air quote “guarantee” – that stability is code lockdown or a code freeze. So, we’ll start and explain what that is. Susan, tell us what a code freeze is.

Susan Finley (11:56):

So, usually, a code freeze or lockdown means that they’re going to push no more changes to the existing systems. The systems will be frozen for a certain period of time. It’s a great method for helping to ensure stability in a system. And it’s actually many systems. It’s not just, especially in retail, it’s usually quite a few systems that get locked down for these periods, but all industries have this at some point within their calendar year. It’s not just retail. Banks do it during RRSP season. Tax services do it during tax season, where they’ll lock down their systems or have a lockdown period.

Mike Hrycyk (12:33):

Perfect. Samta, when is the lockdown? Is lockdown the day before? Is it the Wednesday before Black Friday? What are the rules around lockdown?

Samta Kaura (12:43):

It all depends upon different clients or different companies. Generally, the best is to start early. Say if Black Friday is the end of November, we can have the code freeze by mid-November. So, generally, like should have a week or two weeks of period of stability so that whatever tester or the QA team has done before that is code freezed and it can help the company be confident that whatever the client is seeing is working in a good condition.

Mike Hrycyk (13:13):

Susan?

Susan Finley (13:13):

It’s usually you try to do several weeks before the actual event or period starts. So, I would do several weeks before Black Friday because you’re probably pushing a lot of change in just before that lockdown. So, you want a few weeks to flush out if it’s stable, do we have to patch anything? Are we seeing anything unusual? But yeah, I would be hitting November 1st as a lockdown period.

Mike Hrycyk (13:36):

When you think about it, sometimes the code being in its supporting a feature or a product that doesn’t actually become live until Black Friday. So, let’s say lockdown is on November 1st, and it’s not even live until Black Friday. So, I guess you’re not testing the stability of the feature, you’re testing the stability of the code around it.

Susan Finley (13:55):

Yeah, a hundred percent. That can absolutely happen, especially with promos and that type of thing that aren’t actually going to kick off until that day. It kind of makes the whole lockdown process, you know, it’s really all encompassing. It’s like you’ve got a lockdown, but then you have to have a way to also patch, and you have to have a way to launch and make sure that everything that you’ve got eyes on things as you’re launching, because some things maybe won’t get touched until that point.

Mike Hrycyk (14:19):

So, in your experience, how sacrosanct is lockdown?

Susan Finley (14:24):

I can say that for me, I have never seen anything completely locked down. So, there’s always patches or things that squeeze in. It’s still important to have the process. And what’s really important is to have a process to allow for patches to come in, and how are we going to manage those and make sure they don’t get backed up one behind the other. But I’ve never seen a complete lockdown on a system ever.

(14:46):

What it does do, though, to call it a lockdown, is that it makes everything that goes in get hyper-focused and reviewed and really passes a critical test that there’s a solid business case for why this is coming through. But I have yet to see a lockdown actually hold.

Samta Kaura (15:02):

Yeah, it’s always in the papers, but in reality, there is nothing – even I haven’t seen any lockdown. And that too few weeks before. I have seen a day before sometimes where we won’t push anything just a night before the Black Friday, but I even haven’t seen any lockdown few weeks before. There are always patches coming in which are on high priority, and everyone is working on them beyond their working hours as well. But I haven’t seen any real lockdown happening.

Mike Hrycyk (15:36):

Well, and maybe there’s a terminology problem with that. A patch is a fix, right? And so, if you said this is locked down, therefore we live with the problem, that doesn’t make any sense, right? Patches are necessary, if they’re necessary. That’s part of what that is: you put a high-level executive approval process in, so you make sure it’s an executive signing on the risk. But then the pressure comes back on tests to accurately depict the risk because patches are inherently risky. They don’t get as much testing because they’re necessary, and they need to be there fast.

(16:08):

In my six years in retail with, we were probably supporting 15 to 20 clients at the same time, I don’t think any single lockdown date held. The scary part isn’t patches, though. The scary part is, okay, we need three weeks, but they need that feature, and so now, how close are you willing to go? And I have had intelligent clients who, when that new product feature was risky and it wasn’t there yet that they make the call for it not to go, but no one’s happy about that. Because that new feature was going to generate them a new income. There’s possibly supply chains set up for it. It’s a gutsy move to hold to the lockdown. It’s also a gutsy move to break the lockdown.

(16:49):

Alright, so you sort of hinted at this, Susan. Retail lockdown isn’t just about the front end. What other kinds of things do you need to lock down?

Susan Finley (16:59):

Well, usually they’ll lock down a lot of the point-of-sale systems, fulfillment systems. It usually is pretty all-encompassing that all of the dependent solutions that are being used by customers or brick-and-mortar stores are locked down during that period. They’re really trying not to introduce change into anything that would cause an impact to sales. So, it usually holds true right across the board. I think even in other organizations that aren’t retail, you see the same thing, that it’s all encompassing.

Mike Hrycyk (17:29):

Anything to add there, Samta?

Samta Kaura (17:30):

Yeah, there are many key companies which integrated with e-com sites like freight. So, I have this beautiful defect, which we found when I was doing tests for some other company. We locked down because it’s an e-commerce company, but the freight company, which had its APIs at the back, didn’t have the code freeze. So, they pushed something a day before, and suddenly the checkout stopped on the day of the Black Friday. So, it’s always important to, I mean, patches are fine, but major fixes should not go before Black Friday or the sale when it starts. So, that is one key example I had where it’s always important for other integration systems to also go into code freeze or at least not pushing the big changes.

Mike Hrycyk (18:17):

I think a factor we haven’t really talked about, so we talk about making sure that sales can happen and make sure the promise times are there and those are available. But when we start talking about freight or fulfillment or warehouses, it’s not just if things break, those places live on workflow. If you think about Amazon, the throughput that they have, the amount of stuff that they have to get done, and those include steps. First A happens, and then B happens, C happens, and then you find the right box and that all works, right? But if you implement a change to that process without appropriate testing, that might impact the speed of throughput. And then suddenly, so if your average throughput is a thousand orders a day in July and then the week after Black Friday it’s 30,000 orders a day, and you’ve implemented any, just a 1% change in throughput, suddenly it’s really, really important. Suddenly, you’re not getting stuff to people over Christmas, and there are slowdown effects from that, too. As soon as you start missing your promise time, your calls to customer service go up, and they don’t go up a little bit, a thousand orders to 30,000 orders, that means you might get 29,000 additional calls that day. The scale of problems that can come that testers who haven’t worked in retail just might not get. There’s a reason that when we do performance testing and your average usership is a thousand that we tested 50,000. The scope of thinking about that is a lot bigger than we have thought about. Right?

Susan Finley (19:48):

It definitely is, and I think Samta’s point about that third party that was doing the freight is that these retailers usually also reach out to all those third parties, too and ask them to lock down. They usually are asking them, How are you going to support us through this holiday season and come up with some pretty solid plans. And I’ve worked on both sides of that, being with a third-party provider as well as the retailer, and having to come back and tell them what our plans are. And actually, making for some things that are identified, services that are identified as mission critical, making sure that they actually also have a launch war room or something ongoing to support you if you find that you need to scale quickly or you need to respond to an issue that’s happening. It could be happening with them, or it could be that it’s going to flow through to them, but making sure that your third-party partners are also on point to support you through that time is another big part of that whole retail journey.

Mike Hrycyk (20:40):

Well, that becomes complicated because those third-party partners are also supporting others, and if everyone agrees to the same lockdown date, that would be great, but guess what?

Susan Finley (20:50):

Oh, definitely. Yeah.

Samta Kaura (20:51):

Yeah. It’s not always the case, but if the same date is for all, that would be, yeah.

Mike Hrycyk (20:59):

But then those dates become a competitive edge. It’s like, aha, well, you’re going to lock down on October 31st. We’ll take it another week. Retail. It’s cutthroat.

(21:08):

Okay, so we’ve talked a lot about the importance of why we should do it. How do you test for safety and stability when it’s such an important time? How do you approach things differently as a tester?

Samta Kaura (21:20):

This is definitely an important point. Definitely, we can’t just do it just a few weeks before the lockdown. So, it has to be scheduled into the yearly testing plan when we will do the security testing of the application, so that, in my previous experiences, they implemented fraud detection for the website. So, we planned it months to the Black Friday sale so that the system gets some time to get stable, and moreover, we can do phase-wise, so we can see how the websites reacts when we put all those fraud detections in place. It’s not something which is impromptu; we can just decide one day we’ll do it. It needs proper test planning for the security testing. I would say that’s the biggest risk.

Mike Hrycyk (22:05):

Well, it takes time to plan, schedule and make performance testing work properly.

Samta Kaura (22:11):

It all follows. So, if we have security in place, then we’ll do performance on top of it. It’s not that you do performance first, then do the security, and then do it – it’s all in line. It has to be planned way before the actual holiday season.

Mike Hrycyk (22:28):

Susan, how do you test for success in the holiday season?

Susan Finley (22:32):

Well, I think prioritizing, being ruthless about it, really looking at risk and starting with your highest risk things first and working your way through. You can’t test exhaustively. Really partnering with the business to say, Hey, these are the highest risk things. We are going to start our testing here, and working our way down is really important.

(22:49):

I think the other thing that I like to lean into, too, is asking, well, when this goes out, how are we monitoring it? How are we going to make sure we’re okay? Working that into being part of the testing, even getting into synthetic monitoring, I think, is key. Really pushing back to make sure that monitoring is in place, that’s just testing and production, that’s constantly going to run and going to help you stay ahead of anything that may be happening or start to happen. Getting ahead of it before it actually has a larger impact on the customer or the business. I think in a nutshell, I’m all about risk. Risk and prioritization. And making sure that you’re running along those high-risk areas first. We’d love to test exhaustively, right? It’s just never the reality of where we are actually working.

Mike Hrycyk (23:30):

Yeah, I think another thing is planning for failure, which means you’re probably going to need at least one patch. So, figure out your approval process. But not just that, make sure you have a rollback capability if that’s going to be a problem, make sure that the rollback is tested, investigate using flags so that maybe you can safely things off. But also, do you have environments set up that you can test a patch quickly? Because if you need a patch on Black Friday, you really, really need a patch on Black Friday. So, how do you get it? Is there an environment you can use? Can you get it to the environment? Can you test it for an hour and then get it out with some level of safety?

Susan Finley (24:07):

Definitely. I would even add to that, and I’ve been in too many scenarios where I see patches back up behind other patches. There’s a patch that’s particularly troublesome that’s failing in testing, but there’s two other critical patches now backed up behind it. If you haven’t managed your code right or your environment’s right, you may have to retest those other patches, like move that other one aside. It’s all about making sure your process is really tight to allow these patches to flow through independently, behind feature flags, is another way that you can look at doing it. But how can you make sure that it’s not, that there’s a dependency on these changes that are going to cause some type of unintended consequence if they go out in a different sequence or different order? How do you keep them isolated and flowing independently is really important.

Mike Hrycyk (24:51):

Also, remember that access to people is going to be choppy because it’s the holidays.

Samta Kaura (24:55):

So, where we can do the test early approach where you’ll start testing with the help of unit tests when we start developing the stuff, rather than at the end. So, that can help in delivering the patches smoothly, helping identifying the bugs before which doesn’t happen on the day when we have the patch delivery. That is also one way.

Mike Hrycyk (25:18):

Okay, we’re coming to the end of our time. Do you have a testing story about the holidays or lockdown that you could leave us with? We’ll start with you, Samta.

Samta Kaura (25:31):

Yeah, so as I told, there was this year where we found that defect when the other integrated systems didn’t do the code freeze. After that, the next year when we came into the Black Friday sales, we came up with the idea of doing the small sanity test every day morning so that everything which is high priority – checkout and everything – is working smoothly for those days when it’s the sales going on and when we anticipate the highest revenues. So, that is one of the approaches we came up with, and it helped a lot during those days. Everybody in the company was satisfied that, oh, everything is working fine, at least we don’t have any issues functionally or with the high-priority items. So, that is one thing we did when we did the testing for the retail.

(26:15):

And there was another client where I was working, and they came up with a different strategy where everybody in the morning, we came early, and then we did live testing on the environment before we switched, we made it liveable. Before that, we did testing on the environment with the feature flags. So, we did that testing in the morning to make sure one process of checkout is working fine, and then we switched it to the live environment. That way, everybody was confident. In that open room, there were testers, there were project managers available, there were developers available, and front-end developers available. So, everybody was there to see if there is a defect. There were two different approaches I worked on, and both gave good confidence to the clients.

Mike Hrycyk (27:00):

Yes, but I wouldn’t want to be the test manager who’s saying, by the way, everyone, you’re working Good Friday morning. Susan, do you have a story?

Susan Finley (27:10):

Yeah, I have another defect, which really has to do with discount stacking. It’s one of the things that happens quite a bit when you’ve got multiple promotions happening, like you’ve got to buy one, get one and somebody gets a discount code and you start applying these things in different orders or a different sequence of events and what the calculations start to do can often lead to customer getting a significant discount than the retailer intended in the first place. So, it’s something that we pay a lot of attention to, which is really understanding. I call it the promo code shuffle. When we’re testing, we’re really shuffling through what the variations can happen during checkout, and how all our business rules are applying the same way. Because often we’ll find that there’s a bug lying deep in those. Just the sequence of events alone, different promotions and using even different payment methods and things like that can all factor in.

Mike Hrycyk (28:05):

I remember it was a while ago, though. I remember that when code freeze happened, the entire testing and development team would take a deep breath, and then things coasted towards Christmas with giant spikes of terror when there was a problem and you had to patch it. But mostly suddenly, because you’ve been pushing so hard to get those things released starting in August, maybe July, if you guys are really on top of things and pushing, pushing, pushing, pushing to get that stuff out, get that stuff out for testing. It’s always right up at the end because it took development too long to get it there, and then you’re pushing, pushing, pushing, and then suddenly it’s Black Friday and you’re like, ah, now I can relax. So, you can. But retail is really about spikes of stress, and it’s exciting, but it’s also stressful.

(28:51):

Alright, thank you to our panel for joining us in this really great conversation about testing for the holidays, what’s important, and helping testers who haven’t worked in retail understand that there’s a lot that goes into making your shopping across the holidays work really well. Thank you to our listeners for tuning in. I think there’s a lot to be learned here. If you have anything you’d like to add to the conversation, we’d love to hear your feedback. You can find us @PLATOTesting on Facebook, LinkedIn, and our website. You can find links to all of our social media and website in the episode description. If you’re enjoying listening to our technology focus podcast, we’d love it if you could rate and review PLATO Panel Talks on whatever platform you’re listening to. Thank you again for listening, and we’ll talk to you again next time.