Transcript
Gorbachov: I would like to start this presentation by showing you a couple of images. On this one, we see a manual toothbrush on the left side and an electric automatic one on the right. This image on the left is the vacuum cleaner that you hold in your hands. You have to walk around the apartment and vacuum it and do the job yourself. On the right side, we have a robot vacuum. This might be a little bit more confusing.
On the left, you have a plate in a sink, and on the right, you have a dishwasher. I have a question for you, what do you think these three images have in common? You might have guessed correctly that the objects on the right side perform their corresponding chores in a more autonomous, effective, and efficient way because they reduce effort, they save time, and they ensure better outcomes. To maintain good health, we need to perform these tasks consistently. It’s very similar to a professional environment where, for example, developers have their own set of essential chores that they need to perform to maintain a healthy code base and workloads. Things like code reviews, writing documentation, refactoring code, writing tests, dealing with security, accessibility.
One chore that stands out due to its frequency and impact is code migration, which is what I’ll focus on. This is the visual representation of what I’ll be discussing. It’s a robotic arm that takes one type of file from one pile and puts it in another pile, and it does it efficiently and systematically instead of us doing it manually. Just like other chores, this task is not always the most enjoyable one. Many people see them as boring and mundane. Of course, we can find people who love vacuuming or who love brushing teeth or converting tests, but it’s very rare, usually. These chores are very crucial for maintaining the overall system, whether it’s your dental health, room cleanliness, or code health. It’s just as important to know how efficiently we can perform them. In this talk, I’ll share our approach to dealing with complex conversions. We used a method that wasn’t very common in the industry that helped developers save a lot of time to build great, amazing features.
My talk title is, Hybrid Approach for Large-Scale Migrations from Enzyme to React Testing Library. We used AI here, another key technology. I would like us to also keep this question in mind, is AI a real nice tool or is it another overhyped solution when it comes to more complex things like code generation, or whether it can truly handle some intricacies of a complex code base that we have at Slack? My name is Sergii Gorbachov. I work at Slack as a staff engineer. I’m a part of the developer experience department.
Table of Contents
This is what we’re going to talk about. Problem statement, current state of things, approaches and solutions, evaluation and impact, and later I’ll share more information about important findings.
Enzyme and RTL – The Basics
What is Enzyme and RTL? Let’s start with the basics. Those are test utilities that allow testing React. React is a JavaScript library used to build UI interfaces. Why is it important? Why did we even start this project in the first place? Enzyme supports React 17 versions or lower, and in order to upgrade to the new version you need to either migrate all the tests to the new framework or lose test coverage. There is a real necessity of migrating all the tests to RTL, otherwise you might lose the foundational confidence layer in your code base, and you’ll prevent developers from moving forward with more modern testing approaches. To support daily interactions, Slack depends on performance, responsiveness, and smooth user experience. Meeting these standards was crucial because it reinforces Slack’s reputation as a high-performance communication tool. Migrating to React 18 was very important and was a high priority goal for the company. Of course, we had to migrate all the tests as part of it.
Problem Statement
At Slack, so where did we start? Initially we had 15,500 tests to convert where one test takes about 30 to 45 minutes, and overall, we estimated it would take more than 10,000 engineering hours. All those tests were written in more than 5 years by hundreds of frontend developers. Weekly downloads, the state of the problem right now as of November 2024, Enzyme is being downloaded 1.7 million times weekly. Enzyme-adapter-react-17, 500,000 times weekly, and RTL about 10 million times. We see that it’s not just a Slack problem, even right now as of today, many other companies still use this old framework. If we, let’s say, think that all of those projects that use Enzyme have 10 test cases each, it would amount to 20 million test cases. That’s a lot of effort to convert those tests. Of course, not all of those projects are valid and not all of those tests will be converted, but the scale here is quite significant.
Current State
What did the present look like when we started? There were no adapters. We couldn’t really run all of our existing Enzyme tests with the newer version of React. There was nothing open source. We couldn’t really use any automation tool that we could adapt maybe for our code base somehow. No previous articles on working automated solutions, that was also not available. At this stage nothing was available for us to do it automatically. We started questioning ourselves, are we just doing something wrong or is it maybe too difficult? Usually in such cases you don’t want to create anything custom, you want to reuse whatever other people have built. You don’t want to reinvent the wheel. In our case, it felt like we had to invent the wheel for this specific problem. In this talk I’ll walk you through Slack’s approach to this problem.
Why you might be interested? It’s applicable to complex conversions. It doesn’t necessarily have to be frontend testing related, it could be any complex stuff, let’s say moving from one CI system to another, converting code from one programming language to another, or doing some version updates. These migrations are usually very time consuming and error prone, especially if you have large legacy code bases. Also, we used an innovative approach that I would like to share with you later. Maybe you still use Enzyme and you want to get rid of RTL and you want to learn how Slack did it. It’s a real-life successful case for LLMs where we saved a lot of time for developers and created scalable solutions for very similar migrations in the future.
Approaches, Solutions, and Timelines
Approaches, solutions, and timelines. Responsibilities, I was a part of a frontend test frameworks team, and we were tasked to first lead the migration of tests owned by more than 50 teams with more than 150 frontend developers. We also had to convert the tests. The most fun part, we had to automate and optimize this process. Challenges. The first challenge that we faced was, how do you divide work fairly and organize this whole process for more than 150 developers in 50 teams? We had to do it in a way that presented least friction for all of them so they could be still productive and do their own regular work. Workload distribution specifically was challenging because it can impact efficiency, it can impact collaboration and morale in large teams. Uneven workloads can lead to bottlenecks. Let’s say one team can have a lot of tests and they will be overwhelmed, whereas another one would be underutilized. We had to avoid that. In the next slides, I’ll talk about how we divided the work.
Feature ownership is one of the documentations that we have at Slack where we see who owns what, but there also were challenges with that. Not granular enough. Let’s say we have a feature that is owned by two teams and you don’t know which department to assign those files to. Out of date, team names changes, or there is organization restructuring and people don’t always update this document so we didn’t know who owns that specific file that we found. Missing assignment, legacy features that were created six years ago, no one really even thought about ownership at that time so it’s unowned but it’s active. Yes, it happens in Slack. Disproportionate ownership. Some of the features, they are very cross-team. Let’s say messaging is a very powerful part of the app in Slack, and one core team owns it and other teams contribute to it and add their functionality. It turned out that one team of 10 to 15 developers owns 30% of the tests in the code base, and we couldn’t really allow them to do it for the next 2 to 3 years.
Test authorship, we also looked at who actually changed their test codes there. Partial authorship, we have a bunch of people from various departments who touch this file so we didn’t really know who is going to own and convert that file. Change of teams, one person was in one department then moved to another, and all of those files would go with that other person to that department. Large refactoring projects, someone changed one line of code in hundreds of files and it doesn’t mean that that person should be responsible for converting the test. How did we in the end divide this work for 150-plus developers in all of those teams? We aligned workload distribution with ownership and authorship, and it allowed us to first ensure that each department was assigned the code that they are responsible for.
Second, they got the code that they were actually familiar with because they touched that code at some point. We actually enable developers to maintain this balance between doing their migration tasks that is very important for the upgrade and also do their own regular work where their managers were asking them to ship features or do whatever they were supposed to. We divided the workload evenly across every single developer.
Frontend developers who actively wrote React had to convert 10 test cases in Q3 and 33 test cases in Q4 and Q1. We didn’t really strictly follow the usual division by departments, by teams, by organizations. Instead, we focused more on the direct interactions with the developers. We bypassed the regular hierarchy in the company. We had one large team and everyone knew that they had to contribute a little bit and convert those tests. We had all this idea in mind that we first gained better test quality from it, and we were able to get new exciting React 18 features at some point. If someone, for example, didn’t do that little bit of work themselves, it would essentially block the rest of the company. It also motivated people to contribute. We also offered very good progress tracking by person, by team, by multiple teams, by department, by sub-departments so that developers who did those migrations they could make sense of the progress. Flexibility was key here.
Now I’m going to talk about the optimization and automation process. First quarter that started in August and ended in October 2023. We had 15,500 test cases at the beginning of August. At the end of October, we had 13,000 tests, and we converted 2,500 test cases. You see a graph here, that’s pretty much us. We were slowly, casually walking towards our goal. At that point we converted things manually and just investigated what to do next. Zero percent was automated. November, January 2023, Enzyme tests remaining.
At the beginning of November, we had 13,000 tests, at the end of January we had 8,000 tests. There is a burndown chart. Looks pretty good. We have some decline. We converted 5,000 tests. How did we do it? During that process we used AST or abstract syntax tree. We created a codemod or tool that automatically converted the tests. How it works, you represent code as a tree structure. You query some of those nodes. You create rules to convert pattern A to pattern B. We were not walking towards a goal, we were riding this bike so it would take us much faster from point A to point B. If we look closer at the front wheel of this bicycle, it doesn’t work. It’s mounted wrong. In order to bike it properly, we had to move the handlebars to the side, to one side, to another side. That’s what we felt when we used our instrument, it just didn’t work properly. I’ll talk more how exactly it didn’t work properly and what issues we had.
In this slide, I just wanted to show you how AST works. Where we break down this code, console.log (‘Hi’); as three pieces of code. The first object is type of identifier where we have name “console”. Then the second one is another object type of identifier with name “log”. The third one is the argument string literal with value “Hi”. Just to give you the idea how it works. Then, AST logic for our specific use cases. We had this function, mountComponent. We pass some props. We create a variable wrapper. We store the result from the mount method. Then we return this wrapper. In order to perform the conversion, we had to follow all of those steps. In order to make this conversion where we have at the top, Enzyme, and at the bottom, RTL, we had to go through those five steps. It’s not really that bad. There’s also JavaScript and TypeScript complexity. There are so many different ways to abstract this logic.
Example number one, we use an arrow function approach. Example number two, we first declare the variable and then we assign function to it. For each of those methods we had to create different AST rules. That’s the traditional, conventional way of performing migrations. Clearly at this point it was already difficult. There is also Enzyme complexity. We went through all of our code base. We looked at how many methods were used, and there were 65 methods. We decided to convert the first 10 most frequent ones. Let’s say, find, that was found in 13,000 cases. That’s not it.
On top of all of those files, we are able to also, for example, call functions like filterWhere, where we pass a callback function. We look for property ‘active’ even if it’s true. If there is a div element that has this class, then we can do something. I just wanted to show you that it was extremely complex. For each of those use cases, we had to create new AST rules. This is how we felt. Our brains just exploded because we knew that we didn’t have enough time to do all of those things. That’s the most traditional common approach that people usually take.
That’s not it. There is also additional complexity in terms of testing philosophy. RTL, for example, interacts with a rendered DOM simulating real user behavior. Enzyme focuses more on implementation details. Let’s look at this example where I have this div element and we have this button, and if you click on it, it either shows, Switch is ON, or, Switch is OFF. In the example of Enzyme, you actually have access to the property of that object and you can directly do whatever you want. In RTL you test things from the user perspective. First you have to find the text, ‘Toggle’, and then you have to perform this click. This example shows that we cannot really use Enzyme code and create a certain rule that would lead us to the RTL version. In the end, it didn’t work, but I just wanted to show you that there is a lot of complexity and no traditional tool was able to perform this. In the end, we have 45% success rate of code in a file. Let’s say we have 10 files, 5 of them would be converted, 45% correctly, and the rest you need to convert manually.
December 2023, we were experimenting with new things. We still had 12,000 test cases so we converted about 1,000 tests. There is a graph here, but this period of time is important because we tried AI. We got access to Anthropic model Claude 2.1, and we tried to use it to convert our tests. This is how we used it where we have prompts and Enzyme test code. Let’s say I need assistance to convert this Enzyme test and you provide the code with those prompts. What did we get out of it? We had 40% to 60% success rate converted in a file. There was so much variability because some of the cases were converted either super well because they were easy or it was disappointedly bad, and we didn’t really even release this tool to the developers. They were using the AST one. We got this really new, cool-looking, sharp bike, but if you try to bike it, it probably won’t work. I would love to try it but probably not a good idea.
Failure analysis, but this is what we were trying to do and understand why we failed and where we failed, and maybe we could understand where we can improve. We looked at humans and our instruments and why humans are so smart. Why can they convert these tests so easily? They got access to rendered DOM, pretty much HTML code that you see in the browser. React code, they also authored. AST conversions that we provided with our codemod, and extensive experience with frontend technologies. Whereas our codemods, let’s say, in case of AST only had access to Enzyme code and conversion rules that would not really create all of them. LLM, Enzyme code and prompt. What did we decide at this point? We were very desperate to find some good working automated solutions because we still had to meet the deadlines, optimize this process. One thing that came up to our minds is to model this approach after humans and see if it’s going to work or not.
Next quarter, February to May 2024, we still had 8,000 tests in February. In April, we had 1,000 tests. May, we had zero. We finally got to the end of this process. We converted all the files. How did we do it? We used our innovative approach. We combined AST and LLM. I’ll show you what our pipeline looked like, and then more about the logic that we added there. This is the pipeline, end-to-end, where you provide the file. Then, steps 1 to 3 is context collection where we get the file code. We look at the DOM tree for all of those test cases. We get the partially converted code with our codemod, then we package it all. In step number 4, we send the AI API request, we parse the response, we run the linters and auto-fix stuff. We actually run the code and see if it’s passing or not passing, check the results. If it fails, we have one more feedback step where we also add the logs and dynamically create prompts based on what happened.
Then, if it doesn’t fail or if it fails, we just output the results. At that time, it was quite expensive to make a lot of API calls so we only had one feedback step. It’s worth mentioning that step number 4, implementing everything for AI part, took maybe 20% of our time and the rest was 80% of our time, but it was a huge game changer. No other tool was able to help us combine all of this very dissimilar disparate information like test code, DOM tree, prompts, partially converted code, so that was something that AI and LLMs helped us a lot.
AST and LLM key innovations, we were able to get 20% to 30% improvement beyond the capabilities of our LLM model using two new things. First, DOM tree collection, and second, LLM control with prompts and AST. DOM tree collection, this is very specific for RTL where in order to test something you have to test it from the user perspective so you have to actually see what code is getting rendered and how users would see it in the browser. For that we created an adapter where you first collected the DOM tree for each test case. Let’s say we have a file with 10 tests and each of them have their own configuration and each of them renders different HTML code, so we collected all of that and used in our prompt. LLM control with prompts and AST, this is more interesting. I think that’s the most exciting piece that we added.
First, we have AST partial conversion with rules, so all of those AST conversions that we knew would result in 100% accurate conversions, we performed them, but for everything else we just added annotations. Annotations in this case is just a piece of code or a comment on top of a certain string that we didn’t know how to convert. What it allowed us to get is hallucination control. At first, we had their suggestion with instructions, so we gave the range of things that LLM should convert to, so it wouldn’t be something completely random.
Second thing, we labeled every conversion instance, and LLM knew that it should not, for example, look at the import section. It should not look at the abstracted logic for rendering. It just knew that every single line was labeled for it. We added this very important piece of data, which was metadata for LLM. I haven’t seen anyone else using it before, but for complex conversions, it worked for us super well. In the end, we got to this amazing result, 80% of the code in the file was converted successfully. We were very happy. We were able to ride this new, amazing, vintage-looking, very sophisticated bike. It would get you from point A to point B much faster. Of course, it was still not ideal, because if you look at the right pedal, it’s still missing. If it’s a flat surface, you can still get wherever you want to.
Evaluation and Impact
Evaluation, code quality. Of course, you don’t want to send or share really bad code with the developers. They get very angry very fast. What we did, we took a test set of a certain amount of files, and we broke that down by easy, medium, and difficult buckets just to better represent our code base. We looked at things like imports, rendering logic, JavaScript, TypeScript, assertion, and so forth. These are the results. Eighty percent on average of selected test files is automatically converted correctly, and the rest, 20%, you have to perform yourself. We compared human converted test cases for a certain number of tests, and then we generated our conversions with our tool, and we compared those two versions. Our quality bar was whatever frontend developers actually did, so we tried to match them. Impact, we also looked at how many tests we converted and how many engineering hours we saved. We were able to run 338 files with more than 2,000 test cases, and we looked at how many test files passed.
In this case, it was 16% of fully converted files. That’s great. This was a pretty decent result. Then, 24% were partially converted files, where in each file, there’s either 20% to 100% of test cases passing. It didn’t really give us enough information because you sometimes have 5 test cases in a file or 50 test cases in a file. We went one level deeper where we looked at all of the test cases whose code was executed, and they either passed or failed. Twenty-two percent of test cases passed. We calculated that 45 minutes per each test case was saved. Seventy-five percent of test cases did not pass because the logic was too complex. There were some large requests that LLM API could not handle, or there were some setup issues.
In this case, we only calculated cases where we knew that code was executed and it passed. We didn’t really count anything else. In reality, if you have, for example, 80% converted test files, but there is a syntax error, nothing ran, we didn’t really calculate that towards this goal, because a year ago there were a lot more AI skeptics. We had to have proof for every single thing that we’re doing. In the end, we can say that AI codemod saved us 22% of developer time, and we have data that supports it.
Also, what if people didn’t use our tool? We also collected the data about adoption. Adoption for us was number of files that people converted with the codemod and without, and the total number of converted files. In 64 cases, our tool was used. It was a really good result. Of course, it was not 80% or 90%, but still, I think because it worked well, people liked to use it. Also, another thing that I wanted to mention is that it was a company goal, so people were allocated time, and they would love to use anything that would make their lives easier. There was also motivation coming from that part. It’s not just us creating a really great tool.
Important Findings
Important findings: LLM limitations, lack of real-time feedback, such things as runtime logs and other data might not always be readily accessible. For example, you need to run your code, you need to get all the errors from those tools that run them, rather than just generate stuff. We can generate as much as we want, but how can we verify that it’s actually working? In this case, AI really requires pre-processing and post-processing. It just cannot work really well as a standalone process. It must be integrated into the larger pipeline, which is another tool that we have in our inventory. It’s not always about the model, but how we use it, how we collect the context, what prompts we write. Most modern models, based on my experience, are all performing very good, but you have to know how to confuse them. In our case, we modeled context collection after humans. We wrote good prompts specifically for the Claude 2.1 model. We converted the tests.
Then, if we were not sure if we converted properly, we added annotations, and that allowed us to get that really high result. Our approach is transferable to other projects, unit test generation, code modernization, readability, typing, and conversion. There are so many different approaches and pipelines that you can use, but ours, for example, I like that you can generate the code, you can run, you can verify, you can add any other step or any other tool to make sure that your code actually works and gives you the proper results. I think, this year, I’m working on test generation at Slack, and I have really good, decent results. Maybe in the future, or maybe next year, I’ll share it in the next QCon conference.
Another key point, avoid using AI — counterintuitive a little bit — if there is any other deterministic way to do it. This is the main question that we always ask ourselves, do we need to use LLM for this specific problem? Usually, we go for something more deterministic. If there is a tool like that, I would rather spend more time understanding it and fixing our logic or cleaning up the data, but AI and LLM is not an answer to all of our problems. Conversions where LLMs perform well. Complexity, where there’s very high variability, there is no structure, there’s few rules. It’s common in the training set. If you use some obscure library that is not very common in the internet, you probably won’t be able to get really good results. The dissimilar information that no other tool can consume. This is where LLMs shine.
For our case, where we had test case code, we had HTML code, we had prompts, we had partially converted code, no other tool can consume that information. We got really high-quality results using LLMs. Model after humans. This is what helped us. I’m not sure if it’s going to work in every use case, but if, for example, you are planning to create your own project, just think how humans would go through this process and identify pieces where you could maybe add one or two LLM calls. For us, it worked, and that’s how we modeled our pipeline after.
Summary
In the summary, the question that I wanted us to keep in mind, is AI just another overhyped solution when it comes to complex tasks like code generation and conversion, or can it truly handle the intricate details of a code base? As a standalone tool, it’s more of an overhyped solution than not, based on my experience, based on our code base, based on our tasks. I’m not saying that that’s the truth, but that’s my personal opinion. If we integrate this tool into the larger pipeline, that’s where we see real benefits. That’s what worked for us. It dramatically improved the quality of our conversions. Without LLM, I probably wouldn’t be even standing here and telling you how we did it with LLMs.
To look at our journey from the beginning to end, we started with manual conversion where we were investigating things. In the middle, we had this AST phase. We were very frustrated. We were not sure where to go next. We were pushed by our managers to optimize and automate this process when there was no other tool existing in the industry. Then, at the end, we combined AST and AI, and we’re very happy. We have very steep decline there. The burndown chart looks great. Then, in May, we had some tests, and that other line at the end, I think I just deleted a bunch of tests because no one owned them, no one wanted to convert them, just get to the finish line.
Resources
If you want to learn more about our journey, you can read this blog post about this exact project. We also open-sourced our tool on npm, since there was plenty of requests from other companies where they wanted to know how we did it, so it’s open. I hope it’s going to help others to not struggle as much as we have.
Questions and Answers
Participant 1: When you were mentioning that there was a loopback with the feedback, you mentioned that just due to cost, there was only one feedback loop. Did you have any metrics on how much that additional feedback loop yielded positive results, and maybe if a second feedback loop would have changed the results?
Gorbachov: I believe it was about 10% to 20% better results on the conversion, but the next extra step was already not very useful. We didn’t see any benefits because if the first didn’t really yield any results, there was already something so wrong with the code that it just didn’t make sense to spend more money or to build this pipeline even longer. We just tried once. Also, yes, there was limitations on resources. Although I was not working on that infrastructure piece, but the team told me not to abuse this.
Participant 2: What is the cascading dependencies across the files? How do you deal with that? The second question is about the token limit. As you know, the LLMs usually have a maximum token size when you deal with large files, especially when you’re dealing with a legacy system, there are very old files, there’s so many large files. How do you deal with these challenges?
Gorbachov: If I’m understanding correctly the question, the first one was about the dependencies, if you have multiple layers of importing some logic and stuff. For this specific problem, it was not that big because the source file already had all of those dependencies properly imported, and it was mostly used in mocking or doing the setup. The setup between those two frameworks was pretty much the same. It’s mostly just the logic in terms of the methods between Enzyme and RTL, or in other cases, it didn’t work. It just failed and we didn’t really do anything specific to fix this.
The second question about the number of tokens, yes, some of the components, they yield a query with 8,000 lines of code, and we were just rate limited or did not get a proper response. They were not super common, but if we hit that in the first call, then in the second one, we did not include the DOM tree. Let’s say you have one file with 10 test cases, and each of those 10 test cases yields a 300-line DOM for this specific component, the second loop, we just did not include that piece of information. That’s how we tried to limit the number of tokens in the second call. In the first one, we were able to see the error specifically. We saw the error is because you have a very long response, and then, if this is the condition, just drop some of this information.
Participant 3: Were you typically running the tool on a per file basis, per test case, per folder? How long would the tool take to run?
Gorbachov: We did not really work on parallelization. It was one after another. Let’s say you pass it a folder with 50 test files, it’s going to go one by one by one. First, we did not want to overload our API endpoint. One or one-and-a-half years ago, the infrastructure was not that strong. Even if we parallelize it, it wouldn’t work. One conversion for one file maybe takes from two to four minutes, depending on the complexity. How we did it, for most of the conversions, we ran it at night when people don’t use our endpoints, so we don’t overload that one API endpoint that is used for other projects. We just did it in CI.
Then, how we presented the results. Let’s say for one department, we say we’re going to run it and we gave them a list of files and we bucketed them into 80 to 100 passing test files. That’s the primary number of files that they would look at. Then 60% to 80% converted properly, then 20% to 40%, so they would go from the highest success rate and lower and lower. They probably just downloaded the files, checked it manually. Also, one thing that I wanted to mention is that none of this code was merged to our main branch without people actually looking at the code and signing off on it. At that point, we were very careful about it.
Participant 4: You mentioned that your next project is unit tests. Do you have any initial thoughts or ideas on that that you’d be willing to share?
Gorbachov: At least the pipeline looks very similar, but we also have a lot more resources in terms of what API can handle. Right now, the pipeline looks not just like 5 steps, but probably like 10 to 15. It’s very intentional. Every step is very intentional. For example, you get a certain error and you can use LLM to fix this specific error. Then you have another problem, you are very intentional with that specific task. You have to break up all this process into very minimal, low cognitive load for the LLM, and then in the end you get pretty decent results. That’s the only difference compared to this project. Yes, you still have to run the test. You have to get the run logs. You have to reuse them. Then if there is another issue, so you pretty much follow the same steps, but make it extremely complicated.
Participant 5: I was wondering that you just talked about you used LLM when the last 12,000 or 13,000 cases were left. How big was the team who worked on this? Was there any feedback that you didn’t start with the solution in the beginning when you had lakhs of tests to convert?
Gorbachov: I was the main experimenter there, but our team was like five people, and we were converting the tests and improving, whoever had ideas. Also, some of the developers who liked AI, and it was their hobby to learn new things, they also participated. It was mostly five of us.
The other question about when we started, yes, I wish we had this tool at the beginning of the project. Then we would have saved so much more time and our results would have been much more than 22%. Twenty-two percent is a little bit disappointing at the end when I saw it because we already had more complex cases at the end, because humans just go for the easiest thing and they convert the easy test cases. We are left out with more complex stuff that even frontend developers spend days just to understand what the logic there is. We started using it more when we had about 8,000 test cases. At least 20% out of 8,000, whatever the number is, but I think it’s much larger.
See more presentations with transcripts