Transcript
Srini Penchikala: Hi, everyone. My name is Srini Penchikala. I am the lead Editor for AI/ML and data engineering community at InfoQ website and I’m also a podcast host. Thank you for tuning into this podcast. In today’s episode, I will be speaking with Wenjie Zi, senior machine learning engineer and a tech lead at Grammarly. She has over 10 years of industrial experience in artificial intelligence and holds a master’s degree in computer science from the University of Toronto, specializing in natural language processing.
In this episode, we’ll discuss the topic of why many ML projects don’t succeed in the real world, and what aspects, both technology-related as well as organizational aspects, affect the success of ML projects. We’ll also talk about what potential gaps of communication and understanding can happen between the business leadership team and ML practitioners and how to address these gaps.
Introductions [01:23]
Hi, Wenjie, thank you for joining me today. First, can you introduce yourself and tell our listeners about your career and what areas have you been focusing on recently?
Wenjie Zi: So first of all, thank you so much Srini for having me today. My name is Wenjie Zi. I am a senior machine learning engineer at Grammarly. I’m also one of the co-founders for TAPNET, a Toronto AI Practitioners Network.
My journey in machine learning started almost a decade ago when I was doing my master degree at University of Toronto through the MScAC program. So since then, I’ve been lucky to work at different companies across industries, mostly exploring NLP and also some other techniques like CVE or recommendation systems.
So for my career, it’s actually pretty interesting to see how machine learning has evolved in our field throughout this 10 years of journey. I still remember first when I was working at a company called VerticalScope. We were working on spam detection, sentiment analysis, where we were using very traditional NLP methods like CRF, HMM models. And then around 2020 I was working at Borealis AI. We were working on semantic parsing systems to translate natural language questions into Python code or SQL code, and then giving back the answers in natural language too. Back then we were using customized BERT models. That was about a hundred million parameters, which is a pretty big deal at that time. I’m also working at Grammarly, exploring large language models and gen AI technologies. I’ve been involved in building our in-house RAG systems to do the search. And most recently I’ve been working on improving Grammarly’s new assistant, which is agentic tech bot that we designed to help users doing writing tasks and more complex tasks along the way. Yes, so that’s about me and my career path.
Srini Penchikala: Thank you. Yes, we can definitely get into some of those topics today in this podcast. You’ve been speaking a lot about how ML projects fail, recently at the last QCon conference, which is a unique topic. Not a lot of speakers talk about this particular topic, right? So what motivated you to choose this specific area of AI/ML space?
Wenjie Zi: I think Susan Shu Chang, who is the track host of the QCon, reached out a couple of months back and asked us to put together presentations on our insight as practitioners in the industry. So then I started to pause and think what topics that resonate with me and that are unique that I could provide rather than something that everybody has talked about online.
And what comes to my mind right away is that if you search online, you’ll see a lot of positive things about AI, especially nowadays. They talk about how powerful AI is, what kind of gains they’re able to get by adopting AI or some kind of technical details in terms of how they push the boundaries, adopting some new technologies, and get a better model out the door. But the truth is, being an AI practitioner, that’s not always what we feel. And most of the time we’re actually pretty frustrating or we’re suffering with some kind of failed experiments instead of celebrating success all the time.
So it’s important for me to be able to kind of present this side of the things to be honest about the uncertainties in machine learning projects and just set realistic expectations for whoever wants to work in AI or adopting AI in their own companies. And more importantly, to share some failure points that I’ve seen that commonly happen and helping people to avoid making the same mistakes and increasing their chance of shipping something to production. I think that’s what motivates me to come up with this talk.
Why Real World ML Projects Fail [04:58]
Srini Penchikala: Can you talk about some of the main reasons why many ML projects fail? What are the pitfalls the machine learning developers and architects should avoid when they work on machine learning applications?
Wenjie Zi: I think before we start talking about the common pitfalls, it’s good to talk about the concept of failure for machine learning projects because machine learning projects are naturally uncertain. So there’s actually common to see them fail in the middle, right? Sometimes it’s actually the ideal situation we want to see. We want people to try out different ideas and experimenting with things and fail fast. And in that case, we can empower the team to iterate on their ideas and finally get something that’s valuable. So those type of failures are not considered as bad failures. We should actually be encouraging them to do that.
And on the other side is what we want to focus on today, is what are the failures we should try to avoid? And of course, different people have different views on these things. And depends on your roles and the company setups, this can also be unique. But in my presentation I try to conclude this into five common reasons I’ve seen across different teams and industries.
The first one is it often start with tackling the wrong problem. So once you have a business problem, you actually have to translate into a solution. And sometimes people believe this is a ML solution we need, sometimes they don’t. And it’s good to keep in mind that not all problems need to be started with a ML solution because it’s really pretty expensive. And maybe a rule-based solution is good enough or maybe the company is not ready enough to work on AI for this particular problem. And in that case, we should wait a bit.
And the second one I talked about is regarding about the data challenge. In machine learning, there’s a saying, we call it garbage in, garbage out. It means that if you’re feeding the model garbage data, it’s hard for the model to learn patterns through the data you give them. So emphasize on how important it is to have a clean version of the data collected and try to learn patterns on top of that. And everybody who has worked on real world problems knows how messy the real world data can be. And we’re talking about inconsistent formats, mislabels, biased samplings. So these can all cause the project to fail sometimes silently until at the very end of the iterations when you try to do the error case analysis and realize something is wrong with your data.
The third problem I talked about in my presentation is turning a model into the actual product is hard. So there is a very iconic publication from Google back in 2015. The paper is called Hidden Technical Debts in Machine Learning System. And in the diagram, they show that how little the modeling side is compared with the whole system, where you have the whole infrastructures for data collection, model versioning, monitoring the performance and all that component. So it’s one thing to build a model in a notebook. It’s totally different thing if you want to turn it into a working product. And I’ve seen a lot of projects just get stuck at this transition point just because of how heavy the lifting is.
The first point we talked about is sometimes a model works great offline, people are very happy about it. And then when you launch it online to the real users, you start to see a different perspective. Maybe people don’t like it or people are not using it as frequently as you expected. And that’s where the real value is happening. So when there’s a mismatch between the offline success and online reality, then it leads to poor performance of the entire project.
And lastly, but not the least, there are a lot of non-technical blockers. Things like unclear optimization goals, the lack of understanding about AI in general from business stakeholders, these kind of things. Many times it leads to the failure of the project even though the technical part of the project are sounding and the people are able to give some good numbers. So I think these are something that we should pay attention to and try to avoid when we are working on a new machine learning project.
Technology Challenges [09:15]
Srini Penchikala: Definitely. We can talk more about the non-technical aspects a little bit later, but can we talk about the technology aspects more, Wenjie? Can you elaborate on a couple of examples? How do you see the technology type of issues causing the machine learning projects to fail?
Wenjie Zi: One thing that resonates with me the most is when we talk about offline success doesn’t always translate to online failure. It reminds me of one of the projects I worked on at the early stage of my career, which is a recommender system where we were trying to do image recommendations to user. And it’s not hard to imagine where the misalignment coming from. Like when we’re training an offline model, we need to collect historical data. And we can’t use all the data to train the model so we are doing some samplings of the data we have, which is different from the online modeling part where you are taking real-time data one by one to get the results out. So there are some problems happened at the offline data collection sampling part, which caused some bias that didn’t translate well into the online sections.
And the second part that technically caused the misalignment is that for offline model training, we need some evaluation metrics, that we can’t get using the offline data. Like for example, the historical click-through rate, whether something is being clicked or not. And we’re using these binary decisions, binary labels as our training objectives. But in online set time, we’re launching the model and getting the real user behaviors in and judging whether the model is working or not, which cannot be fully optimized while we’re working on the offline model.
So this misalignment from both the data side and also the modeling optimization side can cause some issues later on. And it’s quite common that people want to start working on a naive model and get it out the door as soon as possible so you can start to collect real-world feedback and using this to justify whether your offline evaluation and offline data part is correct or not. This is a good practice to definitely keep in mind so we can have something launched very early and iterate on top of that rather than you just close the door and doing the offline optimization all the time for a very long period of time, and launch it to online and realize it’s not working as expected.
Srini Penchikala: So any takeaways from that experience? Can you talk about what your team did in addition to iterative development and other aspects? Anything else they did to address those values?
Wenjie Zi: I think the biggest thing is the iteration part. There’s also definitely something about the data, like really everybody needs to come up with the checklist of the problems that your data might have. And making sure that before you use the data for any of these optimizations, there are some observations that have been done, some checks have been done to make sure the data is representative of the problem you are dealing with. And it’s a big part of the machine learning practice that’s a little bit boring. It requires time and some dirty work, but definitely necessary if you want to get a good model out from it.
Organizational Challenges [12:16]
Srini Penchikala: Can we talk about the non-technical aspects? Did you have any challenges from the senior leadership or other organizational elements that also contributed to ML projects failure?
Wenjie Zi: I think we’ve talked at the beginning of the talk that a lot of people are super excited about using AI these days. And because of this eager, they are just throwing AI at some problems that they’re dealing with. And the truth is that different companies have different level of AI readiness, right? Do you have the data logged and saved somewhere that the team can use? Do you have the right tools, the infrastructure, the engineering capabilities? So all these questions I think need to be answered before you decide that you want to do AI solutions for your company. So this is some kind of organizational understanding or awareness that you need to do before execution.
The second part is that I’ve seen that AI generates a lot of values when they successfully adopted into specific industries like AI in finance, AI in healthcare and so many more. But it’s actually one of the hardest problem to solve because people working in those industries speak very different languages than the AI engineers working on the projects. So when there’s misunderstandings between the team, it can be very disruptive in terms of the project.
Need for Collaboration Between Business Team and ML Practitioners [13:29]
Srini Penchikala: You mentioned about some gap between the business leadership and the ML practitioners. Could you elaborate on that and what the team did to address those gaps?
Wenjie Zi: We worked with personal banking before on the credit space. And it is pretty easy to observe that while the business holders are great at finance and the problem space and understand where are the pain points their system has, they’re not trained to identify where ML can help and whether their system or their data is ML-ready.
And on the other side, for the ML engineers, we have backgrounds in AI. We know everything about modeling, but we don’t know what does delinquency means, what are the delinquency orders in the credit space or how adjudicators do their works to decide who to approve and who to reject. So the first thing that our team did is we set up deep dive sessions, really take our time from both parties to talk about what is the domain we’re operating on, what are some principles, what do we need if we want to do something successfully to help both parties understand each other. And even though it takes some time or looks slow in some ways to begin with, it definitely set up the success for the team because later on we understand each other better, we appreciate the help we got from each other. And I think this is a good thing to keep in mind if you want to work with domain experts that are not trained with AI technologies.
Srini Penchikala: It always helps for everybody in the project to have some experience or expertise on the domain, right? So understanding the domain helps with everything.
Wenjie Zi: Definitely. Yes. Sometimes it’s very necessary. If you don’t, it’s hard to come up with something that’s valuable for that domain, right?
AI/ML Emerging Trends [15:06]
Srini Penchikala: We can talk about some emerging trends. You mentioned about AI agents in your introduction. The recent trend in the AI space are mainly the generative AI and large language models. And now especially in 2025, we are hearing a lot about the AI agents, Agentic AI systems. So how do you see these trends influencing the machine learning projects? And most importantly, from your experience, recent experience, are there any pitfalls specifically about gen AI that you can share?
Wenjie Zi: One thing I have observed is that people are getting less patient. Because right now you have access to different API calls to harness the power of LLM, it’s easy to prototype something using just prompt engineerings. And a lot of companies, their version zero is built on top of this idea. Just greater wrapper on top of something. And because of how quickly you can do the first version, people are having higher expectation and less patient in terms of getting better and better results using this gen AI-powered technologies.
But the truth is that it’s easy to have a zero to one to build a thing to start with, but it’s hard to get a good quality out from it. We all talked about the hallucination of LLM and some other problems. If you don’t take time to collect your own domain specific data, labeling your own domain specific data evaluates your system on top of data, it’s hard for you to know what exactly is the problem that version zero is suffering from and knowing what other direction to improve it. So people are kind of delaying the evaluation works, but I think this is understandable, but you can’t constantly avoid it. You still have to invest in this at some point of your project life cycle if you want to stand out, to not be a wrapper company anymore. So this is some kind of investment we want to see more from any company who works on gen AI and who wants to use the AI engineers to work on something different than all the other competitors are doing.
Srini Penchikala: Thank you. Anything else you can share on the agents now what they’re good for? There’s a lot of hype, right? So what are the use cases you see them used for?
Wenjie Zi: I went to some tech talks at San Francisco, and definitely agent is a big thing there. I’ve seen people use agents to do some automated sales work and marketing work. It makes sense for sales works, for example, there are a lot of manual works or have a lifting at the beginning of the entire life cycle. You need to reach out to people using emails or some other tools. And the agent can help gathering the information the company already have, generating targeted ads or targeted sales emails and send it to the potential clients and getting them to be interested in your current company or current projects. And your sales teams can now more focus on people who are actually interested in your project rather than doing the outreach or large amount of outreach at the beginning of the whole process.
And also, I think everybody right now is working on agentic chatbots, right? The customer services is a very natural place to adopt these type of technologies where you’re leveraging tools and some internal materials for them to get the answer faster than if you are directly interacting with a real agent.
ML Technologies and Tools [18:28]
Srini Penchikala: Can you talk about some technologies and tools you have used in your own projects if any of our listeners would like to learn more about this topic, what tools they can try out.
Wenjie Zi: For learning specifically, I like the online materials. I mean, QCon is a great resource, right? There are a lot of conference recordings, blog posts, podcasts that you can listen to. I like the fact that there are real AI practitioners or self-engineering practitioners sharing their own experiences. I think these are very valuable.
And besides those online materials, another thing I like is some books. For example, Chip Huyen is somebody I like a lot. She’s O’Reilly book author. She has two publications, one on design machine learning system, the other one like AI engineering for foundation models. Those books allow me to learn something more systematically rather than just learning piece by piece by listening to some online materials. So I like the combination of both.
And the last one is that there are a lot of new technologies that are evolving these days. I don’t think anybody has a standard yet, so it’s especially hard to learn from some kind of matured materials. But there are great talks that people are organizing that you can learn from the people who are actually also exploring it and working on it these days. For example, the organization I head, Toronto AI Practitioners Network, we just posted a talk on agent from Professor Mengye Ren who is leading the Agentic AI Lab from New York universities, and Afsaneh Fazly who is a founder of Agent Company and Himanshu Joshi from Vector Institute who have seen a lot of AI startups, like how they’re doing agent works. So, that information I gathered from one talk gave me a lot of insight in terms of what other people are focusing on.
Srini Penchikala: I know you mentioned about those community involvements that you’ve been actively leading. So can you talk about how others can get value out of these community engagements? Any specific groups or how do you contribute or learn from these groups?
Wenjie Zi: We have a LinkedIn page, name is TAPNET. And we just started a online series that everybody from anywhere are welcome to join. We are going to invite people from San Francisco and Bay Area and other places to share their insights. So it’s a good place to keep track of what we have been working on and get involved.
Srini Penchikala: Thanks, Wenjie. Do you have any additional comments before we wrap up today’s discussion?
Wenjie Zi: Nothing particular, but I’m super excited to have this opportunity to have this podcast and very excited to see how AI changes in recent days.
Srini Penchikala: Yes, it’ll be interesting to see how AI evolves or develops this year because it’s been going at a fast pace for the last few years. We have to wait and see how it goes. Right.
Thank you for joining this podcast. It’s been great to discuss with a practitioner like you on one of the important topics in the AI/ML space, the topic being how to successfully deploy ML applications into production and what aspects to avoid when working on ML projects. So the real-world production implementation is important. And to our listeners, thank you for listening to this podcast. If you would like to learn more about AI/ML topics, check out the AI/ML and Data engineering community page on infoq.com. We have been publishing a lot of articles on the AI topics as well as podcasts on several different topics. So please check out the website. Thank you.
Mentioned:
- Google Whitepaper: Hidden Technical Debt in Machine Learning System
- Chip Huyen’s Book: design machine learning system
- Chip Huyen’s Book: AI engineering for foundation models
- Toronto AI Practitioners Network (TAPNET) LinkedIn Group
- Talk on agent from Professor Mengye Ren who is leading Agentic AI Lab from New York universities
.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.