PLATO Panel Talks is taking on Software Testing, AI and Machine Learning with our host, Mike Hrycyk, sitting down with Sean Wilson (Senior Director, Engineering, WindRiver) and Nathaniel Couture (VP Quality Engineering at Xtract One Technologies Inc.). How does AI impact the future of the testing profession? The panellists explore the pressing question by exploring the benefits and challenges of incorporating AI into testing processes, how AI may change the nature of testing, and the ethical considerations and risks associated with AI in software development.
Episode Transcript:
Mike Hrycyk (00:00):
Hello everyone. Welcome to another episode of PLATO Panel Talks. I’m your host, Mike Hrycyk. Today we’re going to talk about AI machine learning and how these things relate to testing. It’s a question that is on everyone’s mind these days. We have been doing our own investigations. We’re using tools that have AI here at PLATO, and so it is part of what we’re doing, and we want you to understand it in the same way that we do. I’m going to turn over to our panel of experts and let them introduce themselves – Sean?
Sean Wilson (00:28):
Hi my, I’m Sean Wilson. I’m currently the Senior Director of Engineering for WindRiver Software. I work in the studio developer portion of our product. I’ve been in testing since I got my first manual test job back in 1998, actually working on financial treasury, and I’ve been through various test roles and development roles since then.
Mike Hrycyk (00:46):
Thanks, Sean. Welcome back. Alright, Nat?
Nathaniel Couture (00:49):
Hi everyone. My name is Nat Couture, and I’m currently the Vice President of Quality Engineering at Xtract One. And, I’ve been in quality assurance testing, engineering – or software engineering for 22 years roughly. I dabbled in consulting, worked on products like, ya know, the world’s first portable CAT scan machine, and now I’m working on a weapons detection portal for sports venues. And, over the years, AI has been introduced, so the products here at Xtract One use machine learning models in our products. So it’s kind of cool. So, that’s a little bit about me.
Mike Hrycyk (01:25):
Thanks Nat. Welcome back. For those of you who don’t remember, Nat worked with PLATO for three years. He owned our automation and performance groups, and it was great working with Nat. So let’s get started. We’re going to be a little bit cliche and do the same thing we’ve often done. Let’s define what we’re talking about. What is AI, what is machine learning? Are they actually different things? There are some questions around that. So, we’ll start with you, Sean.
Sean Wilson (01:50):
Sure. So artificial intelligence, it’s a subset of computer science, which means absolutely nothing. But basically the idea behind artificial intelligence is to build systems that can think like a human. So, the ability to adapt to input, to discover, to infer or to reason is sort of the overall general concept of artificial intelligence. Machine learning is a subset of that, so they’re not the same thing. Machine learning is in the Venn diagram for all things AI, but machine learning is more teaching systems to be able to learn from data without needing direct programming. So I want to give it data and have it be able to do something with that so it can learn from the data and I don’t have to code it explicitly how to do it. And then there’s a whole broad subcategories underneath it that extend it out
Mike Hrycyk (02:35):
To be super genericizing, which I love to do. AI is about thinking about what you’re going to do and learning, and machine learning is about giving the AI the knowledge and background to help it think.
Sean Wilson (02:47):
Yeah, it’s a way of working with that. What humans do with thinking relies on a lot on data processing and what we’ve learned in the past. And it’s part of making a machine or a system that can actually have artificial intelligence to be able to infer and make decisions. They need to have that machine learning model running in the background to generate the data and know how to infer and how to gather information from it.
Mike Hrycyk (03:07):
Nat, do you want to disagree with this?
Nathaniel Couture (03:09):
No, that was actually a really, really good description. And yeah, I mean I often use them interchangeably. Most people do, and they really aren’t quite the same thing. One is a subset of the other. The machine learning process is much like when you’re teaching kids something, you’re spoonfeeding it information with the hopes that it’ll recognize bits and pieces of that information at a later time. And the quality of that model really depends on how much good data you’re feeding it.
Mike Hrycyk (03:38):
I think that in today’s terms, we don’t have AI without machine learning. We haven’t developed a way that, hey, you’re alive, you’re thinking just go for it.
Nathaniel Couture (03:48):
Yeah, I think that’s probably a correct statement. I don’t know – I am not a machine learning or an AI specialist by background, but I think that would probably be considered true. I think pretty much any aspect of machine learning – so there are large language models that we interact with or the machine learning models that do various data analysis, I guess, similar to what we do here – requires us to feed or perform some training action with a set of data. And then I think the way that the artificial intelligent implementation uses that data is a little different. I think your deep learning models, and your regular machine learning models and your natural language models are all a little different. But yeah, I think it all applies.
Mike Hrycyk (04:35):
Agree or disagree, Sean?
Sean Wilson (04:37):
No, I think that’s right, and it’s an aggregation. So in order to get to actual artificial intelligence, you might have multiple machine learning models under the covers that all know how to process different types of data and can learn from the data that they’re inferring. So, like a larger language model that’s one part of data and we train it on a subset and then we teach it, and then you might have multiple deep learning modules behind that. So for every machine learning model, if it’s significantly advanced, you will actually have multiple deep learning modules that each know how to do pattern recognition on certain types of data because that’s the core of it. You have to be able to look at this data, understand what it is, and then process it so you can learn about it.
I was just thinking as we were talking about it, it’s like learning a language as an adult versus as a kid we think about intelligence in terms of how we learn as children. We just learn by seeing the world around us and interacting with it. But as an adult, when you try and learn a language, you have to learn all the grammar rules and then you learn the language trying to look at all the rules. And what we’re doing with computers now in artificial intelligence and machine learning is really trying to teach it a language as an adult. We’re following all the rules to try and get it to understand specific things bit by bit. And each one of them, deep learning, machine learning to AI, it’s a progression of that all the way.
Mike Hrycyk (05:58):
The analogy, which isn’t perfect, but the one that I like to – I was about to say it’s for the real world, but it’s from a movie. So that doesn’t really count. It is if you remember from Fifth Element when, well, I forget what her name is, but the main character lady, when she wakes up, she’s a blank slate, and she doesn’t know anything about humanity and what we’ve done and etc, and she watches a whole bunch of TV on fast forward, and they’ve done this in multiple movies. It’s that absorbing of all that knowledge that gives her the context to understand what’s going on. Unfortunately when you watch the history of humanity, it’s not very pleasant. So, she’s not learning the best things about us. But that’s kind of an analogy is AI doesn’t just wake up in its thinking. AI needs to have the learning to grow and become. Kind of like kids. They wake up as a blank slate, and they just cry and poop, and then they learn stuff, and the more they learn, the more functioning.
Alright, so we sort of have a groundwork here, but what does AI have to do with testing? Is there any accuracy? And this is the thing that fear mongers in QA say is that, hey, there’s this – you got HAL [9000] from 2001 [A Space Oddessy] who’s just going to take over everything. You’re just going to walk in and say, okay, AI, do this and walk out the other way and you’re done with your software. Where does AI and testing fit together? Let’s start with you, Sean.
Sean Wilson (07:12):
So, I thought it was funny that you bring up HAL. I’m a big John Oliver fan, and he recently on his show did a thing where they had newscasters reacting to artificial intelligence in the world and they all brought up the Terminator as their example of artificial intelligence and why it scares them. So HAL, yeah, same idea. What does AI have to do with testing and that vision? I think that movie tells us more about testing of the AIs, and maybe not enough testing was done to generate to a HAL scenario. I don’t think that AI is going to take over testing. I think that just like we’ve seen with automation, just like we’ve seen with every other evolution in software testing since I’ve started. It changes what we do. It gives us new tools to do what we do, but it doesn’t stop people from being testers. I’ve only noticed more testers, not less as we’ve progressed on in software and the technologies that we can bring in because if we can use AI in our testing, we’re also using AI in our development and we’re developing AI things for products and customers and there’s a lot more testing to be done. So no, I don’t think it’s going to take over the test world.
Mike Hrycyk (08:16):
Alright, cool. Nat?
Nathaniel Couture (08:17):
Yeah, I mean AI and testing, I think realistically there’s probably a little bit of both. It probably – there are elements of software testing that may get taken over by AI tools, but I think for the most part, all I’m seeing is a more complex environment to do software testing in. And so now you have a piece of software that may do something unpredictable. There’s a non-deterministic system now built into your software. So at least in the software that I’m working on, we have not only an integration between the software and the hardware, but we have software, machine learning model and hardware. You change the hardware, it affects the sensors, which changes the data that we’re receiving, which affects the software. We just created a more complicated product to test. And I think, yeah, I think it’s just added another layer of stuff that we have to do as software testers and a whole area around qualification of models, security around them, the way that data, that training data has to be labeled. And what we’re seeing is if you incorrectly label something or if as you’re producing it, if it’s a vision model or in our case we use walkthrough data, so it’s a metal detector. So we say, okay, you’ve carried this weapon through at this location, at this speed. If you mislabel anything like any of those parameters, your model is then likely to incorrectly interpret that information for good or for bad. And it’s making a lot more work. But then on the flip side, you have tools at your disposal that help you do all kinds of stuff. As a testing consultant, you can generate test plans, test cases, all kinds of stuff that are – they’re not perfect by any means, but just like in any job you can save yourself some time by using them. So, I don’t know, you get a bit of extra horsepower, but at the same time, you get a more complex product. So yeah, it’s interesting for sure.
Sean Wilson: (10:15):
I have a really interesting tangential story about what you’re talking about in bad labeling. They were training an AI vision system to detect cancerous cells and they trained it and they trained it and they trained it. They got it to a point where a hundred percent of the time it could detect the cancerous cells. And unfortunately very often when you build a system, you don’t build transparency in, so you can’t always see how it makes a decision. So they started using it and they found that its failure rate was super high. As they looked into the transparency, they realized that it was detecting cancerous cells when there was a ruler in the picture because when the doctor had identified it as a cancerous cell, he would use a ruler to measure. So all the pictures they had of these cancerous cells had a actual ruler. So the AI was training itself on absolutely the wrong thing, but because it was an unsupervised model where they weren’t human tagging everything individually, it didn’t notice the failure. And you’re right, I mean this type of non-deterministic system that might be doing something that we expect it to do is really hard to test because how do you know – even if it makes the right decision, how do you know how it got there? So again, I think it just opens up new opportunities for testing new ways that things can go badly wrong
Mike Hrycyk (11:23):
And AI doesn’t have common sense in any way that we relate to it. So you’d expect often we might think of the tester as the common sense in a team because the focus of developers often makes them not think about the things that are there, and that’s one of the hedges that a tester brings to it. But an AI doesn’t have a concept like that in any way. So having AI be your tester means you’re not adding common sense to it. The ruler that’s part of it. You have to have a ruler for it to be cancer.
Nathaniel Couture (11:53):
Yeah, that’s a really interesting one.
Sean Wilson: (11:57):
It scares you more about what AI can do, in terms of the real world, because it can get the right answer without you knowing how it got there. And you have to be able to test for those things. You have to know how to do that.
Mike Hrycyk (12:09):
Interesting related story that I heard. So in Japan there was a lot of concern around germs and etc at a bakery, and it was taking a lot of time to pick bread out of the bread counter, get it over, and then pick the right charge because they had all of these different baked goods. And so someone decided, well, we can automate this. And they say, yeah, but how are you going to know which bread item was picked? And so, they developed AI that could drive the picking up of the bread, moving it to the cash, and charging it so that no human had to touch it until the client got to pick up their bread and eat it. And they built an AI that – there was a hundred different types of this bread products or something like that. They built an AI that could determine that, and it was working, and it was doing a really good job and added all these efficiencies into the store. And then someone got really clever and they started using it and they found a few other things and one of the things they found that it could do is because it was taught to recognize bread, it could recognize cancer cells at a really high success rate. So, they’ve pivoted this software and the AI and it’s now being used in lab detections for cancer. That’s just really cool.
Nathaniel Couture (13:34):
I mean it’s really, really neat. I mean, you look at anomaly detection as an example of its use. You just show it properly working software over a long period of time. If it’s not changing in any way, technically it should be able to detect a change in that pattern. And so, there’s some areas where in longstanding monitoring i.e. regression testing kind of thing, it’s going to be in every product. So, one way or the other, we’re going to be facing it as software test use.
Mike Hrycyk (14:05):
Alright, moving on to the next question. So one of the giant buzzwords of 2023 was ChatGPT which made a giant splash and did a whole bunch of things. Do either of you have a story with playing with ChatGPT?
Sean Wilson (14:18):
So I have not ChatGPT itself, but with CodiumAI, which is like ChatGBT for programmers, effectively. And just recently at CES, one of the customers came in and was talking to our team about integrating AI into the system. One of our lead engineers was there and back in his hotel room, he brought up a model with visual code and CodiumAI. And then, by Wednesday, he was demonstrating in the software the ability to take a requirement in text, put it into the CodiumAI system and have it generate an automated test just directly from the requirement system. Now, CodiumAI is like a ChatGBT in that it’s a large language model based on programming examples. So it was able to do this very effectively. And it was something that the engineer just tried. It’s like, huh, I wonder if that would work pretty easily in this system and it was able to do it. But that’s about it for me and ChatGPT from a testing example,
Mike Hrycyk (15:12):
And that is an interesting point that I didn’t put in my list of questions that I did want to mention is that one of the big things that AI is being used for is to generate code and the starter code. And I think that’s fine. It’s a time saver. The developer goes back in, fixes it, delivers it, the developer still is necessary. But the risk that I have in that is if you think about the idea that, and there’s all these other examples. The one where they were sending tourists to soup kitchens as great restaurants because it didn’t make the right connections. So, if you think about it without that common sense of the AI, there’s generated code that could have something in it that seemed okay to the developer, and that’s doing something that we’re not looking for. And so, that’s sort of a different dimension for testers. Because testers do assume a certain level of common sense on developers, but now there’s this thing that doesn’t have common sense that might be making an assumption based on its learning, that’s just doing something really odd and unexpected. And so our eyes – one of my phrases that I use is testing with your eyes open. Well, now you have to test with your eyes wide open because there’s things that you may not know and may not see. Nat, did you have an example?
Nathaniel Couture (16:19):
Yeah, I mean I played with it like most people. It was kind of two parts. And I did do the coding stuff. We were implementing a robotic system to help us do our walkthroughs in an automated way and a few other test apparatuses. I have a computer science degree, but I don’t program every day, and I didn’t want to ask someone to write the code for me. So I said, well, why don’t I just see what ChatGPT will give me for code? I gave it a pretty generic description of the thing I was trying to implement and the language and a few details. And it did a pretty good job. Wrote some code, but it didn’t run initially. And then, as I got more specific, it got very close to what I needed and to the point where when it gave me errors, so I would move it over into my coding environment to give me an error, I would just copy and paste the error back into ChatGPT, and I’d say, “this code that you wrote me gave me this error. Can you fix it?” And it would actually regenerate the code and say, oh, sorry about that. And then it would correct it. But long story short, it gets you 90% of the way pretty darn quick. And, I know that some of the programmers that I’ve worked with recently do make use of the suggestive automated coding routines, but I don’t know how much code is being generated by these. I don’t think it’s being relied upon that much. And certainly not in the quantities that would change a tester’s job yet. But I think if it did get to a point where you – large amounts of code were being produced, you might not know how that’s being written. Are they accounting for all of the exceptions? Are they – because the context isn’t there. And so it can do a very good job, but you have to essentially write it very specifically what you want. And at that point you’re writing a novel in order to get exactly what you want from it. But it will create it, which is kind of neat. Otherwise, it makes these weird assumptions. And that’s where I think we’re going to find if we do see more and more of this code being generated and put into products, we’re going to have all kinds of weird things as testers that we’re going to bump into. Where we’re like, why did they do that? Why does it behave this way? And then there’ll be no rhyme or reason. Whereas normally, it’s like, oh, they mis-implemented the boundary – instead of ‘less than’ they used ‘less than and equal.’ But now it could be all kinds of weird stuff.
And then, from a consulting perspective, like you said, you often get asked to produce test plans and test strategy documents, and 80% is boilerplate, and then you go in and change some stuff. I think from that perspective where you’re working for an organization where they require a lot of documentation on your testing, I think things like ChatGPT are hugely beneficial in terms of time saving and creating all this documentation. That’s the stuff that I always hate doing as a tester. And it takes a ton of the labour out of it. You still have to check it, but it does a pretty bang up job. I’d love to see more test tools build that in. I don’t know if they have or not, but you should be able to describe what a feature is or take a requirement import it and it should be able to automatically generate a good portion of your tests. And anyway, that’s my thinking around where the tool vendors should go, but maybe they have, I don’t know.
Mike Hrycyk (19:18):
I don’t know to that depth. I’ve played with it around strategies and test plans, and it comes back with a nice comprehensive, fairly well-written generic document that you can then use to actually add your know-how, right? The whole thing that a testing expert does is actually do some risk analysis and think about the things that you really have to focus on, and etc. And so, it’s writing you that for us, it’s not that dissimilar to having an accelerator template where you already have a lot of that boilerplate, then you have to layer in the things that you have. So it is kind of interesting that way.
Sean Wilson (19:51):
There is a risk to raise as well. And, I think, Nat, what you described as you get better and better code out of it, you have to provide it more and more information so that it can get more specific. A bunch of Samsung employees released intellectual property to the world through ChatGPT because they were asking it very detailed questions to help them resolve a problem.
Mike Hrycyk (20:10):
And it learned from them.
Sean Wilson (20:12):
It absolutely learned from them. And then the IP, based on what they were actually asking it, they released intellectual property that obviously did not make Samsung very happy. And that is a risk because the machine learning systems that we’re talking about are learning from the questions you ask as well. And depending on how their data collection and aggregation systems go or the licensing that your company has with them, I mean unless you have brought in a version of it in and you’re training it, you have to worry about IP getting out because you ask it a question, you’re giving it answers.
Mike Hrycyk (20:42):
And you’ve sort of raised two points here for me, and one is, Nat, you talked about having to more and more requirements. So the one parallel to that is if we go back to the early days where we started shipping work offshore, right? You had to build out your requirements more and more and more, right? So you’re not shipping it offshore anymore, you’re shipping it into the digital. So, there’s some parallels there that we could think about. But then the other parallel I’m thinking about is if we look at the shift left movement and we look at the fact that we’ve started using cucumber – started, it’s old now – but going to more plain language things so that we can move some of the duties back towards the BAs and the people who understand the business processes and etc. And it’s interesting that this movement might push us a little away from our agile kind of idea of being document-light to being a little bit more document-heavy again. So we’re defining things a little bit more fuller so that when we hand it to AI, it can do a better job. And I don’t know if that’s the path that it’s going to take. I have no idea what the requirements folks have thought about – Hey, can I use AI to help generate my requirements? But it’s interesting.
Sean Wilson (21:44):
That’s funny because the assumption being made in your statement if I can, is that you don’t know necessarily. Nothing has been entered into the system to ask about testing before you get it for testing. I think we would actually wind up being more agile if we shifted left. If we had an AI system that we were using and it was internal to our own systems as the developers were entering the requirements for the code, that would all be there. So you would actually have less documentation again, by the time you got into the test team, who would now have a much better idea of what testing needed to be done to engage with testers because they would’ve trained that all the way through. Because I wasn’t thinking just throwing it over the fence so that just QA is getting it and just QA is using this. I was actually, when you said shift left, I was now thinking, okay, well let’s go all the way left. If the developers are also using it to generate some of the application code, that information is already entered in. I just need to ask the question about testing. But you raise it. That’s a very interesting point. If I do have to start from nothing, am I not making the system worse because now I’ve got to only ask it about the testing side and I have no idea. The best meme I saw on ChatGPT is somebody who takes five bullet points and asks ChatGPT to write an email from it, and then somebody else who takes the email and asks ChatGPT to break it down into bullet points. And it feels like that’s the developer to tester crossover there. If both systems are using AI and not talking to each other about what they’re doing, maybe there’s an opportunity to shift that.
Mike Hrycyk (23:03):
How did that turn out?
Sean Wilson (23:04):
Works pretty well actually.
Mike Hrycyk (23:06):
Okay. Nat, you look like you had something to add.
Nathaniel Couture (23:08):
Yeah, I was following back and forth. Yeah, I don’t know. I mean, if you’re using it for testing, you’re probably using it for requirements. You’re probably using it for – at all stages. I think every actor who’s playing a role in engineering a product is probably going to try to use it in some way, shape or form. I think what we need is if you did have a holistic environment that took in each of those roles and started to think about them through each lens and figured out, okay, what can AI do for this role within this team and then put them all together. I think you would end up with a better product, in end, because right from the elicitation of requirements, it might ask you things that you wouldn’t have accounted for. And it would just do – if you’re building a metal detector for example, it might give you some scenarios that you didn’t think of as a product manager. And then as it goes to the engineers, it might do – I don’t know, I don’t want to say better job, but it would approach things maybe a little differently, and it would apply a framework across all product engineering groups, maybe a little more equally. That way you don’t have some teams that are performing super well, high quality output every time and another team that’s really underperforming. Like any process, if you’re applying it uniformly across different teams, you’re going to get hopefully similar outcomes. But yeah, it can screw up just as well as it can help you. It’s like any tool, right.
Mike Hrycyk (24:32):
Alright, so I’m going to move on to the next question. So I know, Nat, that part of your product base is using machine learning and maybe AI depending on how we define it, to help your sensors and systems understand what might be a weapon. So we know that you’re doing that. Is your team, is the QA team, leveraging AI in any way yet?
Nathaniel Couture (24:51):
We are not through any formal tool. Yeah, I would say no. No. And I’d say across the organization, not a whole lot of AI is being used in development or testing despite the fact that we have machine learning models deployed in our product. So we have the ability to build all kinds of stuff. I’ve got ideas on how we might be able to apply it, and I have tested some tools to try to do automated defect finding and stuff like that. But I’ve just not – they’ve taken so much time to set up and provided so little value to this point that I haven’t seen any positive ROI from it yet.
Mike Hrycyk (25:33):
Commercially most of the claims I’ve seen from tools are around self-healing, so maybe auto-generating code for your automation, which I know a number of people are doing, and they like it, still requires the same sort of effort that developers have. You have to go back in and fix it. But there’s the other stuff where you’ve built your visual testing like Applitools and then they have enough knowledge around objects that it can self-heal when things change. I don’t know how incredibly successful that is, but it’s interesting. So Sean, coming back to the actual original question, I know that you’ve been playing a little bit, how are you guys using AI at WindRiver?
Sean Wilson (26:09):
So, we have a lot of AI systems we’re developing and releasing out, particularly at the edge, but within quality we are doing some investigation into, not using it for code generation or test case generation, but where you have a significant data set of automated tests and executions, can we use an AI system to help us identify what tests to run first? Can we prioritize and select based on available time? There’s a ton of research. Microsoft has published a lot of papers about this. There are some products that say that they have built this into their product line. There are some challenges with that that I can talk about. So we’ve been looking because we have, depending on the product line, a lot of automated tests and not always a lot of time. You can’t execute a hundred percent of your automated tests over time. So can we say – could we use an AI system to tell us we’re doing a build, we’ve got 30 minutes during that build process to run automated tests. What automated tests should be run before we say the build is adequate and what level of automated tests are those? And then, before we release a build to QA, we’ve got all these tests, what type of window are we going to pick? And instead of having a human tell us, “hey, I want to generally randomly create a smoke test for my entire application, I don’t know or care what’s actually changed, just run these tests,” which is what we do now. If we can have the system help us identify based on changes coming in from developmental requirements that have been touched, these are the best tests to run in this order within that timeframe. So, there’s a lot of stuff that we can do. And, I’ve been reading some papers from Microsoft and other teams that have been doing this around test case recommendation, then prioritization, and then ultimately, when you trust it, selection. So letting it select the test to run based on change or any other criteria that you might want to put in.
Mike Hrycyk (27:54):
Yeah, there’s been discussions over time about using statistical modeling to choose where you run your regression and smoke tests based on latent fails based on a number of other criteria. Who better than AI to keep track of that over time, right? Because we’re all going to have confirmation bias about that thing that bit us in the ass last and it’s not necessarily the best place to focus your time. That’s interesting. And I like that. That’s a good point.
Sean Wilson (28:17):
The human-based algorithm and it is on what bit is in the ass last time, is incredibly successful with an incredibly senior tester who knows the team, knows the developers, and knows what they did. And it all falls apart when that person is not in that job. If you’ve been working with the product for a while, I mean I can look at the team, I know basically what they did today. I can tell you where to go test because I know they’re having a very bad day. Maybe testing that product area first. But being able to look at the last a hundred builds and we can identify what automated tests have run, what automated tests have failed, what code was changed. Can I look at the code coming in or the requirements that have been touched and selected. Before WindRiver I was with UbiSoft, and the Laforge team there published some incredible stats about a tool called Clever that was looking at developer commits before they putting them in and actually tracking how often they could identify potential failure in developer code before the developer hit submit. And it was very successful. They were actually four years in, and they were at 95% accuracy in prediction, but the developers often ignored the advice. So the system had to go into the Jira at the back end and look for code. They got tagged with a bug that was fixed and then got checked into the same area that the developer was working in in the first place. So the AI system remembered, I suggested based on this change that there was a bug, and you run tests, and then eventually you wrote code to fix some code in that same module. “So I was right, and you were wrong, ha-ha.” But as a way of doing training of the model as it went forward and the longer you’ve been running a project, whether it’s manual or automated tests, the better data you have for the machine learning model to actually interpret it and look, right?
Mike Hrycyk (30:01):
So wait, are you saying that statistically they prove that when developers write code, that’s where bugs come from?
Sean Wilson (30:08):
You know, it’s incredibly shocking. I say this on a regular basis, there’s only so fast we can find the bugs, and you can put them into the code a lot faster. And the developers in the room don’t like that approach. Hang on a sec, don’t say that.
Nathaniel Couture (30:23):
Well, the other thing, the AI wouldn’t be shy about pointing out 90% of your bugs come from one individual or something like that too. It’s like, oh, the source of your bugs right here.
Sean Wilson (30:34):
So, there are actual HR concerns around that point of it though. So you have to be very careful with that. You have to be very generic in the dataset that you’re using or it can create bias externally as well.
Mike Hrycyk (30:46):
But if you think about it, AI can go to the next level, which is not that it’s developer X that’s directly causing a bug, but it could be that every time developer X, Y or, Z makes a code change that interacts with something that developer A did that there’s a problem. So, it is really difficult for us to make that jump and figure out, you know what, it’s not X, Y or Z, it’s A, right? But AI might be able to make that depth, and that jump, and that’s really sexy to me. Alright, time to wrap up. I’m going to go with this question. What are your fears around AI in our industry, in the world in general? Go conspiracy nuts if you want to. And then just cap it off at the end with just a simple answer to should testers be afraid for their job?
Nathaniel Couture (31:30):
Yeah, I mean you’ve packed in a lot into that one question. I’ll answer the last one first. I don’t think testers need to be worried. I think the world of software just got vastly more complicated through integration of AI into it. And yeah, and I think, what’s scary is that you don’t always know problems in software that were often hidden. Now you’ve got the software itself, even if it’s doing what it should be doing, could be doing the wrong thing. It could be inherently built into the machine learning models that are embedded within it, and you won’t uncover it for God knows how long. And so it’s just a deeper level of hidden defects in my opinion. And then the other thing I was reading a little bit about how – because here we’re obsessed with trying to get our tests run quickly, cause we have to certify our product because an appliance. And so, I was looking at models for how do companies release changes more quickly. And so one of the leaders in this space is Tesla. And although I’m not a Tesla fan boy, I think they do that really well. But when you look at how they achieve it, it’s by trusting a lot of AI and automation within their processes to an extent where even layers of management, where there is an AI essentially that’s helping them make these decisions very, very rapidly. And that scares me. It means that companies can move much, much quicker, but at what expense? So far, I think we’ve seen a lot of good come from it, but it scares me to what extent that we’re going to allow AI to make decisions on our behalf.
Mike Hrycyk (33:01):
Alright, Sean.
Sean Wilson (33:04):
AI is fascinating and terrifying simultaneously. Nat, you hit on an incredibly good point. They’re using a lot to help them make faster decisions. And when we’re talking about safety-certified software for airplanes, jets, cars, self-driving cars, Mars rovers, you have to be concerned. You have to think of those cases and say, is this the place that I want non-human making a decision about what we test and how we test, and if it should go out because companies who are doing that testing who are also in some ways responsible for the verification are not as interested in doing – they have an incentive to move fast, not move well. And if you take a look at the Boeing situation, that can be a direct relation, right? I mean you go from we tried to move fast, people didn’t get trained up. We allowed a system to work the way that it works and nobody really tested it the way that it should have been. That’s a software test failure from a software development failure and then an industry failure and not doing that verification, that scares me from an AI perspective because it only advances that problem on the other side. It’s also what is going to get us to that place where we can do so many fascinating things. So, I think AI is the way we have to go. I think it’s fascinating to get involved in it and the more I play with it, the more I want to. A part of me that did some philosophy back in the day instead of computer science, – which would’ve been more useful to my career, I don’t know. But it’s that part of me that’s like, is this the right thing or the necessary thing? And maybe I shouldn’t try and balance those things, but should testers be afraid for their jobs? If you want to stay in the exact job that you’re doing today and that is all you do, maybe it’s game testing with a controller in your hand, or maybe it’s software testing where you’re just looking for bugs in a UI. Maybe that job is not going to be there 10 years from now, but there will definitely be jobs in testing. And I think this has been the evolution of software in any industry we’ve seen as it’s advanced, the job will change, but the job will still be there.
Mike Hrycyk (35:05):
Yeah. One of the things I think that the tester role has to embrace more than anything else is asking hard questions. We’re not data scientists, we’re not going to know if the actual dataset is the actual right thing, but asking people to justify how they chose the dataset, like that shift left, three amigos idea, that really brings in at least the questioning of are we doing it the right way? Are we doing the right thing? The other thought I had is so Asimov’s rules of robotics were core and at the bottom – and I’m not saying that AI should have those, but if we think about the Boeing example and drilling extra holes in the wrong places in a fuselage, which seems bad in an airtight system, maybe we should be figuring a way to put into AI. Here are some of the things that are sacrosanct. And so, tell AI right away, don’t release a product that’s not airtight and just have that as a rule. And that could be its common sense. I have no idea if this would work, but that could be its idea of common sense. So that when it releases a product and it thinks about what it’s released and it’s like there’s holes in the fuselage that doesn’t fit rule number one.
Sean Wilson (36:08):
That actually does work. Human algorithm-trained robots are very common way to do it because you can add rules right up front, and you gave a very obvious common sense rule. The problem with common sense rules is they’re common sense and we often forget, why would I need to tell it to make the thing airtight? What is the old saying? I mean, we keep trying to make idiot-proof software. The universe keeps building bigger idiots. I guarantee you anything we can think of to put into an AI to say, use this as a common rule, we will find some other way to do it or some other way to defeat it or some other problem that comes around it.
Nathaniel Couture (36:43):
<laughter> A way to defeat it!
Sean Wilson (36:43):
Nat, I was thinking about your example, in what you’re doing. I mean, guaranteed some bad actor who wants to bring a weapon to a sporting event is asking an AI system right now: “How do I defeat that model of radar?” Maybe I watch too many heist movies on TV and am thinking, how would I defeat it if I had an AI system that I could ask how to go break into that thing and defeat it. Well oh, that’d be cool. That sounds like a fun test case to run, but –
Mike Hrycyk (37:08):
Nat, don’t put your model on to ChatGPT.
Nathaniel Couture (37:10):
No, no. But I actually did type similar searches into ChatGPT for that reason. I think it gives you ideas. I think it comes back to your point, Mike, where it’s our job to ask questions and I think it can do a good job of coming up with questions or you can use a starting point. I think what could go wrong is a good starting point. You describe a scenario, what could go wrong if we do this, or what could go wrong – and it’ll spit out all kinds of stuff. Is it useful all the time? I don’t know, but it might be helpful. But we as humans in the loop. We should be – I mean we tend to do that in the craft of software testing, but seldom does anyone else in the organization, unfortunately,
Mike Hrycyk (37:54):
That’s the ideation question that’s at the heart of every test case – what could go wrong, right?
Sean Wilson (38:00):
Testing is destructive. We used to worry when automation first started coming in, I used to worry, that developers were going to take over test jobs. Not that they wanted them, but developers were going to take over test jobs because they had to write automated test code. Difference is that engineers who are building code tend to be very creative-based. They are doing an act of creation in developing an application. Testers have a mindset that is destruction, not creation. And that will always be the biggest difference. When we’re looking at a test case and saying, what could go wrong? How can I break it? We are bringing more to it. We will use AI very differently than developers who are asking it: How do I create this? And as long as we keep that mindset, that sort of the craft of testing as you said, Nat. As long as we keep that alive and well and keep people thinking that way, we’ll probably be okay. I hope.
Mike Hrycyk (38:45):
And with that, I’m going to end our discussion. I’m going to thank our panel for joining us for a really great discussion about AI. And I would like to say, so this is going to be one of my favourites. I’ve listened to two or three AI IT-centric discussion panels, and it’s like they invited politicians. It’s like, I don’t know, but maybe this, and I don’t know, but maybe this one seemed a lot more real and a lot more interesting. And so I think that this one is going to draw people in. So thank you for that. Thank you to our listeners for tuning in.
If you have anything you’d like to add to our conversation, we’d love to hear your feedback, comments and questions. You can find us at @PLATOTesting on X, LinkedIn, and Facebook or on our website. You can find links to all of our social media and website in the episode description. If anyone out there wants to join in one of our podcast panels or has a topic they’d like us to address, please reach out. And, if you’re enjoying our conversations about everything software testing, we’d love it if you could rate and review PLATO Panel Talks on whatever platform you’re listing on. Thank you again for listening, and we’ll talk to you again next time.