Transcript
Olimpiu Pop: Hello everybody. I’m Olimpiu Pop, an editor with InfoQ and I have today with me, Birgitta Boeckeler, which is a subject matter expert with Thoughtworks in the generative AI space. Well, the title is a little bit more difficult than that for me, so I’ll ask Birgitta to introduce herself and to say, what’s the role with Thoughtworks related to generative AI?
Birgitta Böckeler: Yes, hi Olimpiu. Thanks for having me. Yes, indeed, the title of the role I have right now is quite a mouthful. We currently call it Global Lead for AI Assisted Software Delivery. And so indeed, as you said, it’s a subject matter expert role or some parts of my role kind of are developer advocacy, both internally in Thoughtworks but also for our clients. And so, I talk to people at ThoughtWorks about what they’re doing in this space. I try a lot of these tools myself. I write about them, I talk about them at conferences like at QCon for example, just recently.
Olimpiu Pop: Okay, thank you. Well, now, I have to have a cheeky comment because it sounds like a German name, your title. So probably it’s … Everybody says they’re very lengthy, so yes, it goes hand in hand.
Birgitta Böckeler: The joke would probably be that in German, it’s probably just one word and not multiple words, but I haven’t thought about that yet, yes.
A Snapshot of the Coding Assistants Landscape [01:40]
Olimpiu Pop: Thank you. As you mentioned, you typically write on Martin Fowler’s blog, where you provide updates on current developments in the generative AI space. So, let’s start by saying what’s new, because, in my opinion, the generative AI space is currently moving faster than JavaScript libraries. We just blinked, and probably two new tools in the generative AI space appeared in the meantime.
Birgitta Böckeler: So what’s new? That’s like the frequently asked question right now, right? And I also said this at the talk at QCon that even though I have this as a full-time job right now to stay on top of this space, even I cannot stay on top of everything, right? So nobody has to feel bad if they cannot do that next to their regular day job, right? And even one thing that I said during the talk, an hour after the talk, I saw on the internet that it’s out of date. So that’s the problem in this space right now, right? But of course, there are a few things that are starting to become more settled or there’s certain patterns that we have now in the tools.
And so, when you look back at the evolution of the features in the coding assistants. We had the auto-complete first, the auto-complete on steroids, then we got a simple AI chat, then the chat became more and more powerful. So in coding systems like Copilot or Cursor or Windsurf or all of the many other ones, the chat now also has a lot more context of our code and we can actually ask questions about the whole codebase. And a lot of them, there are a lot of context providers, how we can pull in what is the current git-diff or more and more integration with things, like give me the text of my current JIRA issues, stuff like that, right? And the models have evolved as well, of course, right?
Olimpiu Pop: Okay. Then let’s simplify things. Let’s consider tendencies, because, as you said, the tools are moving quickly, and I agree with that. Everything is moving so fast, as it’s hard to stay on top of it, but then you still have the buckets. As you said, there was a way we interacted. I didn’t like having a chat in the browser and then moving from one side to another, or I didn’t like the part where we were just providing comments. It would be better if we look at it that way.
Currently, we are at the point where we interact with the model and we have an innovative auto insert, or where should we position ourselves, when we are just saying, “Let’s call it an autonomous junior developer next to us. Are we there yet, or are we still not?”
Birgitta Böckeler: Yes, and this autonomous junior developer we can get to if it’s autonomous or not a little bit later. But indeed, the newest thing that has been happening since October, November last year is that this chat in the IDE or in terminal sessions, there are also a bunch of coding assistants where you do this from the terminal. Let’s focus on the IDE right now, that this chat has now gotten a lot more powerful to the extent that you can actually drive your implementation from the chat and the chat can now change multiple files at once.
It can run terminal commands. It can basically access a lot of the things that the IDE has access to. And that’s the thing that provides more automation in the sense of let’s say my coding assistants generate code that doesn’t even compile or has syntax errors, right? So usually, in the past I would have had to go there and tell it, this does not compile, but now it can actually pick up through the IDE on things like linting errors or compile errors and stuff like that and actually, immediately react to that and correct itself.
Or it can say, okay, let’s run the tests and then, it will immediately pick up on if the tests are red or if they’re green and we’ll be able to include that in what it’s doing. And so, that’s what now has led people to use the agent word, the A word for these tools. So that makes it agentic, right? So I think there’s still no good overall comprehensive definition of what an agent is. I think it’s a word that we always have to redefine in the context that we’re in, right? But in the context of coding assistants, it’s a chat in our IDE or in the terminal that can actually access all of these tools like editing files, running terminal commands and so on. And then, yes, do things in a more automated way for us while we’re still watching them.
VS Code-Based Code Assistants vs Plugins of the JetBrains family IDEs [05:43]
Olimpiu Pop: Okay, that makes sense. So now, to just repeat what you said to us, if I got it correctly, currently, we are getting more autonomous. It requires a lot less interactions because at points you needed back and forth multiple times you got to what you wanted. But now, it seems that that’s happening iteratively without so much input from our side. And then let me ask you something else because we’re discussing generically about IDE. Now, from my understanding or the way how I look at the field, there are a couple of big players in this space. You have the JetBrains ones, so IntelliJ IDEA for the Java world, you have PyCharm and all the family.
Those were quite big in terms of IDEs. Then you have VS Code, which is the little brother of Visual Studio, also from Microsoft, and it is widely used. And then, there are the new kids on the block that are coming and bringing those new features. I haven’t tried them yet, such as Windsurf and Cursor, and so on. How are they ranked based on your experience in terms of how they introduce these new usage styles? First, a better question might be whether they are bringing support from external models or external plugins. I’m thinking now about JetBrains, and on the other hand, they have native support.
Birgitta Böckeler: Most of the coding assistance action right now is actually happening in Visual Studio Code, especially if you consider Windsurf and Cursor also to be Visual Studio Code because they are actually forks of Visual Studio Code. And the reason why they forked or cloned as far as I understand it is that, that gives them more access to the core of the IDEs so they can build more powerful features, but because a lot of this progress, I would say is about the integration with the IDE and the developer experience, the user experience in the IDE.
And when you have full access to the core of the IDE, you can do a lot more things. So they’re Microsoft and GitHub have the advantage that they own Visual Studio Code. For GitHub Copilot, they could also delve into the core of that. But then Cursor and Codeium, who are building Windsurf, fork this so that they have more control over this, right? And then, there are things happening on the JetBrains side of things, which for me and my colleagues is a big deal because JetBrains has been the favourite IDE, especially for Java, Kotlin, and some JVM compiler-based languages for a long time.
So, big organisations started paying for IDE licenses because it was so good, right? It was always free before that, wasn’t it, with Eclipse and similar tools? Unfortunately, in the JetBrains ecosystem, things are not moving as quickly. For example, the GitHub co-pilot plugin and JetBrains are often lagging behind in features compared to the VS Code plugin. The things that JetBrains themselves are building are also still in progress, so they’re not working on an agent, for example, but it’s moving a little bit slower. So that’s one of the things that sometimes slows down the adoption of coding systems.
This keeps some developers from experimenting because they’re part of the JetBrains ecosystem and prefer it. There’s also a lot of stuff still to be discovered around where JetBrains assistance is actually powerful, JetBrains assistance enough in some use cases, maybe you don’t even need the AI, right? But yes, JetBrains’ ecosystem is a little bit behind. Most of the action at the moment is happening in Visual Studio Code. They’re also terminal based, coding assistance. So Anthropic recently released Cloud Code, for example, that you run in the terminal.
There’s some open source tools like Goose and Aider that do that. So those are often open source. And in terms of the models used at this point, almost all of the coding assistance I think that I’ve used now allow you to plug in your own API key to actually use models as well from, for example, your Anthropic API key or your OpenAI API key. And in particular, all of them support some kind of access to the Claude Sonnet model series, either by you bringing your API key or by them providing it from Anthropic because the Claude Sonnet series has been shown to be really, really good at coding.
So when I try out different tools, I usually use Claude Sonnet as the model so that at least that part is stable, so I can compare because you always have to use these tools a few times until you get a feeling for … Just a feeling, does this feel better than that other tool, right? It’s really hard to just do with one or two tests and then say this is now better or this is now worse, right? So yes, Claude Sonnet model series, Cursor and Windsurf, I would say are among the most popular. And an open source VS code extension called Cline and a variation of Cline that is called Roo Code. So those four, I would say are the most popular ones in the agentics space right now.
Olimpiu Pop: Let me summarise that. On the tooling side, so on the Hammer side, currently, forks or the Microsoft GitHub are the ones that are leading. And we can also know why a comment is allowed, given that GitHub Copilot already has a couple of years up front, everybody started a long time ago. And then, there are other new kids on the block that are appearing while more traditional tools like JetBrains family, they’re behind when it comes to agentic coding with the proper quotation mark.
Birgitta Böckeler: And by the way, what we’re also seeing is when you look at Cursor and Windsurf, Cursor for … Since they’ve existed, have always come out with really interesting new ideas about the user experience. And then you always … a few months later, you see GitHub Copilot have a similar feature, right? So there’s also this type of dynamic that Copilot has a lot of adoption because a lot of organizations already host their code on GitHub, so they already trust GitHub with their code. So it’s easier to do that, right? And then, they’re often followers to the interesting features in other IDEs, right? That’s at least what it seems like from the outside.
Olimpiu Pop: Yes, that sounds quite close to what I felt as well. I didn’t know that Windsurf, I didn’t follow the space that much as I did last year. I didn’t know that Windsurf is built or developed by Codium, but I know the tools that Codium had before. A lot of tools in the review space, a lot of tools in the testing space. And they always felt that they are ahead of the curve. And now, they got into their closing the gap somehow. But now, that I’m thinking about it, we just jumped in the middle of the problem because we started discussing about coding.
But actually, if you look at the cycle of a project, usually you’ll start with the ideation side and so on and so forth. And then you start by bootstrapping the project, which happens only once. But I’m thinking about this because at some point, me as a company or consultancy, we had to bootstrap at least a couple of times per year a new project. And then, that pushes me to ask the question about this no code or as they used to call them in the medieval days of coding assistance, things like Loveable or I don’t know, some others. I also tried bootstrapping with Copilot sometime ago.
“No Code” Tools Like Replit Are Good for Prototyping, Especially Their Intersection With Serverless [12:56]
What’s your feeling? Are the tools that you mentioned, Cursor, Windsurf, or even VS Code, be as good also for bootstrapping new projects or should we go to something else, have a prototype, see how it feels on the market, and then get to a more, let’s call it traditional space where you’re just getting into coding?
Birgitta Böckeler: Yes, I haven’t used those tools like Loveable, Bolt, Replit. I haven’t used them as much because they also usually come with a platform as a service as well, let’s say, right? So you can immediately deploy them. And usually, with the types of clients we have, that’s not the deployment environment that they have. So they have their own environments. But what I’ve mostly seen my colleagues use this tool for is prototyping, right? And also, the non-coding people, like a designer or something, using this to really quickly get together a real working prototype. And then most of them I’ve heard say that when they look at the code, they feel like it’s still very “prototypey” code, so they wouldn’t want to bring that into production.
But still everybody is really impressed by what those things can do in the prototyping space, but I haven’t tried it myself.
Olimpiu Pop: Okay, well, that’s my feeling as well. To summarise and conclude, we can utilise them to expedite processes on the ideation side. So, rather than having mocks in the traditional space where you have limited options, you can quickly build something like a prototype, gather feedback, and then return to the more conventional space where you make the features upfront.
Birgitta Böckeler: It also depends somewhat on the user. I read an article a few weeks or months ago about Replit and how they have consciously decided not to target coders as their primary audience, but rather non-coders. And part of that strategy is also to lean into the serverless part on the paths, underneath this, right? So to say they can also spin up a database for you and then connect everything, right? Because that is something that noncoders really struggle with, they use coding assistance like Cursor, Windsurf or Cline, which builds up all of the code and then, they don’t know, yes, but what do I do with the database and will that be secure, right?
So I find that space really interesting to monitor in terms of how you can actually create a safe environment with a pass and with serverless for a non-coder than to be on top of that and create something with AI. I don’t know if it will get there, but it can certainly fix some of the gaps maybe in terms of infrastructure knowledge and all of that that noncoders also of course, don’t have the experience with.
Using Generative AI Through the Various Phases of the SDLC [15:22]
Olimpiu Pop: And that leads me to, point one of the, let’s say, anecdotes that you had during your presentation about that guy that made the SaaS in hours and then, he just started crying when he realized that his code is not production ready and then, that’s how we should treat these kind of tools. There are prototypes, but they’re living aside things like operational safety, security, scalability and stuff like that. So it’s better to just be able to split it in smaller chunks and build it as we know to make it safe and robust enough for real life usage.
Okay and now, we touch on what we currently have, but again, the cycle is lengthier than that, and usually SDLC, because we have to call it classical now, even though we didn’t see the proper adoption with it, but we have to call it classical now. How important is SDLC now? Because in my mind always, it was a set of best practices. You had to test it, you had to put to the unit test in place. What do you think would a proper SDLC implemented help with adoption of agentic or whatever it’s called nowadays coding, or should we rely on editing AI and somebody will just take over our problems?
Birgitta Böckeler: For me, SDLC or a software delivery lifecycle is just a description of that there’s multiple phases, for lack of a better word, because of course, by that, we don’t mean waterfall, but we mean many, many, many phases that go round and round in small iterations in an agile way, right? For me, that’s just a description of how we usually do things, right? I guess that now what we have to question as a profession is now that we have these new tools, does that change anything, right? For example, can we skip any of those stages or are some of them becoming even more important or less important, or which of these are best supported with AI and less supported with AI, right?
So in that sense, the concept still is really useful. And it’s actually interesting because I’ve heard this acronym SDLC, I hadn’t heard it in a long time when AI came up again and suddenly, everybody started using the acronym again, at least around me. And I was like, “Oh, what did that mean again?” I don’t know why that is because I think once more as a profession, which we do in cycles all the time, right? We’re looking again at everything that we’re doing and questioning everything and really trying to go back to what is it actually that we are doing and what does AI mean for that?
So it’s a very reflection moment once more in our profession, we seem to be doing this every 10 years or something like that. So far, I haven’t seen anything … I mean, of course, there are a lot of experiments about let’s automate the whole thing or let’s automate the whole dev part, but then the specification is really super important that humans do it. And what about the verification? Do we also want AI to do that or is it kind of like this whole thing that human specify the machine builds and then, we verify again, will it be that, right?
And from what we have right now and also, some of the challenges that we still have with it that we haven’t even talked about yet, because those coding assistants cannot really autonomously build a whole feature for you at the moment, but we can get to that in a few minutes maybe. But yes so, with what we have right now, I don’t see a way to automate the whole thing away, right? In particular, when it comes to specification, when it comes to testing and really knowing that the test is really checking what we’ve really done, right? So testing I think is the space that has to change a lot or is changing at the moment, right?
Yes, all of these questions, should I have AI generate my test and then, do I know what it actually does, right? However, you can definitely utilise AI in some form or another for all of these mini phases and task areas, as it’s a very language-based work.
Olimpiu Pop: I have to relate to one of the keynotes from last year. We had Trac Bannon, talking about the space of the SDLC. She’s looking a lot on how we can augment it. She has a similar research going on her side from what I remember and related to what you’re saying. She said, okay, fix your SDLC, meaning that have the dot on the horizon into continuous deployment space, where you need to have automated testing, you need to have checks in place that ensure that all the phases are properly followed, and then you should be on a safer side.
And about testing, it was either generate the tests and write the code or the other way around, don’t do both of them using generative AI because then it’ll be very skewed towards generative AI not to use left and right,
Birgitta Böckeler: Or it depends on the situation maybe, yes. And also, what I would say, I also see some of our clients asking, “Oh, how can we use AI to fix all of the problems that we have in our path to production, right?” Path to production is maybe also a good word for SDLC, right? And I think, it can be really dangerous when you use it as a band-aid, but you actually have underlying problems where you actually have to fix the root cause in your delivery pipelines or in your testing process, right? Because otherwise, it can become a band-aid that actually makes things worse, right? Yes, because gen AI amplifies indiscriminately.
It can potentially amplify what you have. And if that is really bad or there’s something wrong in the core, then that can be problematic. And the other thing is that because it’s a complex system of things that we do, if we just increase the throughput of one of the things like the coding, for example, with coding assistants, then we’ll get bottlenecks and other second order effects in other places, right? So if you can code faster, can you also review the code faster? Can you fill the backlog faster? Can you create the designs faster? Can you deploy it fast enough? Will you have more technical debt? If you have higher feature throughput, how are you doing product management around that when you have more, right?
And some of those things we can mitigate in other areas with AI as well, but some of those things, you can’t just speed up the machine, if you don’t have a good underlying process with low friction.
Olimpiu Pop: So as you said, the AI is an accelerator. If you’re going in the right direction, you’ll arrive there faster. But if you have a lot of holes, a lot of potholes in front of you, you’ll just get more broken knees and broken ankles because it’ll just drive in the same way faster, but you’ll have more problems, right?
Birgitta Böckeler: Yes, there’s a much higher risk also because of the non-deterministic nature that if you don’t know what you’re doing, then it might actually make it worse, yes.
How to Use Agentic AI in Day-to-Day Coding [21:47]
Olimpiu Pop: But we did have a point earlier where you said that there are challenges. So now thinking about the normal day-to-day job of a developer, a classical developer that knows what she’s doing and what are the limitations, what can we do and what we cannot do.
Birgitta Böckeler: Focus maybe on these agentic modes that we started talking about in the beginning that are now even more powerful than before companions in the IDE where I can say, okay, I need new button on the page, and when the user clicks it, then the following things that happens or something, and then the AI goes and changes one, two, three, 10 files for me and creates a test for me and so on and so on, right? So first of all, I would say it cannot be autonomous. It has to be supervised. As a developer, I still sit in front of it and actually watch the steps, intervening when I see it going in a direction I don’t want it to take.
So especially for larger non-trivial things, I haven’t seen an agent autonomously do anything without my intervention yet, right? I mean, the simplest thing is if the result doesn’t even work, but then that’s obvious, but there’s more insidious things as well about the design that might make it less maintainable and extensible in the future, or I talked about tests, right? So it is quite good sometimes at generating tests, but it can also be a false sense of security, right? So I’ve seen it not enough tests. Then on the flip side, redundant tests, like too many assertions or too many test methods which then make the tests very brittle.
And in the future, every time I change the code, suddenly I get maybe 30 tests that are red, right? I mean, who hasn’t been in that situation with the code base? Often in test, there’s too much mocking. It puts the test in the wrong places, so I might not find them in the future. And it’s also really hard to get it to first do a red test and actually show me the red test, so I can actually use that as a review mechanism to see if it makes sense the way the test is, it would give me a lot of security about reviewing this, right? But it just doesn’t do that.
It immediately goes for the implementation. So there’s this whole testing space, right? And like I said, the design also sometimes just makes it too spaghetti-like. So it is like a junior developer, like you said before. And yes, at this point, I’ve had many examples where I’ve had a light bulb above my head like, “Oh, okay. Yes, I see how it could get that wrong”. And then the next day when you do it, it gets it right. That’s the other thing.
Olimpiu Pop: Okay, so it’s learning based on your feedback or how-
The Probability of the Coding Assistants Getting It Right: What You Need [24:17]
Birgitta Böckeler: No, no, it’s not. No, no, it’s statistics. It’s a probability. One time, it works; one time, it doesn’t. It has nothing to do with learning. Yes.
Olimpiu Pop: Okay, so you’re just throwing the dice.
Birgitta Böckeler: A little bit, yes and then, so our job as a developer becomes one, we of course, have to assess is it worth using the AI in this situation? Is it making me faster or slower? So we also have to know when to quit. So I often use it to just throw something at it where I already know I haven’t described this very well, but let’s just see what it does. And then that helps me think through design, right? And then, I revert and I do it again. I either do it myself or … So we have to assess how can we increase the probability that it’s doing what I want, which we can do through prompting.
Which we can do through features like custom rules through having good documentation, our code-based stuff like that. So how can I increase the probability? But we can never have 100% guarantee that it always gives us what we want.
Olimpiu Pop: Okay, fair enough. For a long period of time, I looked at the space and I just disseminated ideas like pair programming with the AI when I’m writing something and here, it’s generating the test. But then I started thinking and just challenging myself because if it looks only at my code and generates the test, as you said, it’ll generate the green test only based on what I’m doing. So if I introduce the flaw in the code that I wrote, then I don’t have any kind of way of knowing that that was problematic.
Birgitta Böckeler: Exactly. Yes. I’ve also seen it do things like I say, “Oh, the test is red”, and then, how does it know if it needs to fix the test or the code? And sometimes it does it the wrong way around, and then if you don’t pay attention, it actually introduces a bug in the code. Yes.
Olimpiu Pop: Okay. So how I can increase my probability of generating proper code or would proper requirements where I just give them as context would help with that or is it possible to provide, this is the requirement coming from the customer, whatever, the BAE of the team. Would that be possible to bring into this mix to make it better?
Birgitta Böckeler: Just technically, there are more and more ways now to integrate with context providers, right? Like I said before with a JIRA ticket or so, just technically to make it more convenient to pull that in. But then of course, it’s still depends on how is that ticket phrased. I mean, who hasn’t seen those JIRA tickets that just say, “Fix the button”, or I don’t know. I can’t come up with a better example right now, right? So, it increases the probability by being more specific about what you want, right? I’ve also heard this and confirmed it with colleagues who use these tools daily on codebases that are in production.
When they show me examples of things that they implement with it and how they describe it to the AI, it’s often very specific, right? So here are the five fields I need on the UI, this is the database schema we want. So it’s relatively low level and that increases the probability, right? That also increases the amount of time of course, that you have to spend describing it in natural language instead of in code. But in those situations it still often feels to me like it’s reducing my cognitive load and it’s worth using the AI and spending the time describing all the details, right?
That is one way to increase the probability: having a plan already.
Olimpiu Pop: Great, thank you. Now, I have to relate back to something that you said earlier. You mentioned the verification phase in the SDLC, but we also spoke about tests. And now, what I’m thinking is that if I’m thinking about code, I’ll think about the test. I’m thinking about solutions at the solution level. In that case, I’ll think about verification and my feeling after also QCon and some other conferences that attended in this period, my feeling is that verification is becoming very important, especially in the AI space because when we’re talking about the black boxes that AI is, we cannot talk about testing.
Because we don’t have a proper interface, but we talk about verification. So that pushes us to a whole different level from my perspective. Any thoughts on that?
Birgitta Böckeler: Yes. In general, when we use these tools, we always have to think about what is my feedback loop, right? When the AI does something for me, what is my feedback loop to quickly know that it’s the right thing, right? And it can be small things like the IDE now takes some of those things for me, right? When the syntax is totally wrong, the IDE will tell me, just very low level example, right? And then, the higher the abstraction level gets, the more I as the human have to be involved. This even goes all the way to the idea of should we be the ones writing the tests and then the AI does everything else, right?
There’s a video, for example, on Dave Farley’s Continuous Delivery YouTube channel. Think we named it now to the Modern Software Engineering YouTube channel. I don’t remember what the title was, but he was talking about just speculating, will tests be the specification of the future, the coding of the future, so that we write the tests and everything else is done by the AI. And because we wrote the tests and we have to be very specific in tests, so we have to give them as a very specific specification. That’s how we know that it works, and that’s all we do in the future, right?
Writing tests will be coding. So that was one of his speculations, which I found quite interesting.
Olimpiu Pop: Well, I cannot say that I disagree with that because in the end, there was a lot of movement. If you think about Ken Beck and his TDD, and that was very good at the low level side, where you’re just writing the code and the whole architecture, but that was in very small increments. Now, if you look at that, we are looking probably at a different level. It’ll be like probably BDD, behavior-driven development where you’re just thinking about everything that’s upfront. And I have to admit, I tried to work with a couple of product owners to do that where they’re just doing that and then, trying to mimic some kind of DSL and generate the big chunks.
But more than that, my feeling is that it’s more important than ever to have product development mindset, where you’re thinking about the whole thing and you understand it. But all the things that we are discussing now, there are things that you’re learning from trial and error. You need to have to be a seasoned developer, to understand this. And when I’m saying seasoned, I’m thinking about two things. One of them is how to do the coding and how to build the software itself, but also, you understand the industry that you’re operating in and you understand the rules and how the things are working.
The Software Development Career Ladder Might Change [30:32]
You cannot be a junior and graduate to do that. What are we doing with the junior developers? We’ve just eliminated the title of junior developer and we just say that everybody is a senior developer, or how do you help them to get to this position?
Birgitta Böckeler: The meaning of seniority might change, right? Yes, but I mean this definitely frequently asked question. And when I was at QCon at the conference, people ask me this all the time and in parts I also don’t know, we have to see. I always say that I don’t want to romanticize how I learned back in the day and now say, “Oh, the young people, they’re going to do it wrong”. I copied so much stuff just from the internet and just tried, well, does it work or not, before even looking at the code that I pasted, to be honest. So it’s just that I guess there’s more throughput of this now, more speed to this, right?
So we’ll have to see how teams can buffer that. Often it is buffered by senior people on a team. If somebody, junior makes a mistake because of something that they didn’t know yet, then the other people on the team might catch it, right? But now, if the throughput of the mistakes becomes higher, then can you still buffer that with your team? I guess that’s one of the questions, although I would hope that today, you still learn the way we did back then by doing something wrong, and then it gets caught by the safety net around you both automated and in terms of people.
And then, I learned from that. So I hope that it’ll continue like that. And also, it’s often the people that are worried about the junior developers haven’t actually used the AI tools themselves that much. So I think it’s super important that especially experienced people, even when you’re skeptical, and I get it. I also like, I go back and forth on being excited and skeptical about this. We have to use these things and understand what it means because you can’t just learn it from a conference talk or a manual so that when beginners who are coming in and who are all using these tools, let’s be honest.
Then when we tell them no, but you can’t do it that way with AI. And they say, why should I trust you? You’re not even using these tools, right? So we have to use them ourselves so we understand where the risks are and can help the new people coming in and kind of all go through this cognitive shift with each other and hopefully evolve our practices in a responsible way.
Olimpiu Pop: Okay, fair enough. So while listening to you at some point, I was in the position to imagine ways of bringing people up to speed and have a proper ratio of younger folks to more senior developers. And one of the things that I was envisioning at that point was rather than generating tons of documentation on how to do the database, how to do the coding and so on and so forth, incorporate all that information into, let’s call it the CLI, if that helps you with that. I don’t know, let’s make a database in the cloud. And you give it a size, you give it something, the name, and then you just generate it.
And I envision it something like junior developer, a younger person or even someone that just joined the project initially can use the tool, say something that is done or to validate it if it’s about coding. And then, by the day passing, he starts understanding it. He gets to give him the perspective, and then he can be open source looking to how the tool is made. He found the bug, he fixes it. Would approaching it like that where we have guardrails around everything that we are doing. And initially, you are just following the rules and then you can … By the point when you’re just getting to that level, when you understand the status quo, you can challenge it. Would that fix it into your opinion?
Birgitta Böckeler: Yes, maybe not fix them, but definitely, I mean, our safety nets that we’ve always had, we need to double down on them even more than before, maybe in our pipelines and whatever we can automate. But you’re actually, bringing up a good point. People often fixate on the risks with junior developers coming in, but there are also so many new opportunities for them to learn faster and to discover information faster without always having to ask somebody more senior, right? So you were giving the examples of maybe you have descriptions and documentation of things that you actually also then can feed into the AI.
So you can actually now as a more senior developer on the team, by giving everybody else the custom rules for the coding assistant, you can actually amplify a certain conventions that you can use these custom rules as a human as well to read them, to learn about how is this team coding or we’re also at Thoughtworks, experimenting with these prompt libraries for practices as well and not just for coding. If you imagine my favorite example is always threat modeling because it’s security, it’s a daunting practice. A lot of people don’t know how to do it, so they procrastinate it or don’t do it, right?
But you can actually describe certain practices in threat modeling, like the stride model in a prompt. And then, if you give it the context of what you’re working on, AI can give you an example of these things in your particular context. So you don’t have to read the theory and apply it to your particular problem. So that is one example where AI can actually help us understand things a lot faster and apply them to our context, which might also be super helpful for junior people coming in and might actually help them learn things faster than we ever did, right?
Olimpiu Pop: Okay. Thank you. Is there anything else that you think it’s important to touch upon? Something that in our conversation that we need to underline?
Finding the Right Balance Between Gen AI Sceptics and Enthusiasts [35:41]
Birgitta Böckeler: Something that I talk about at the moment a lot is that risk for quality with the increased level of automation or with the agents that generate even more code for us. And because I personally feel it all the time when I use these tools that on the one hand, it feels great and I use them all the time, not just because it’s my role, but also because I like it, but you get sucked into this temptation of “it works”. Maybe I can just push it, right? And I always find things that as a responsible developer, I should still change before I push because I also work on code bases right now where other people work on the code base as well.
So I have to think how this affects them, right? Will they actually understand how this works? Will this get in the way of this other thing that they’re working on or will we be able to maintain this in the future, right? But the temptation is really high to get lazy and to be complacent and just think, “Ah, it’s going to be fine”. And yes, there’s just too many things that I find before I push this that yes, you always have to review, review, review, really pay attention, and it’s like driving an autonomous car and your attention kind of goes down because you just let it drive and then when something happens, you don’t have the attention up.
It’s not a great analogy because it’s very different risk profiles of this, but I find myself just like the agents just going, doing things, doing things, and it feels like such an extra effort and barrier to go over to now review all of this code because we like creating code, not reviewing code, right? So there’s a real reckoning that might be coming at some point here, not just in terms of junior developers, but also senior developers that we just don’t think along anymore.
Olimpiu Pop: I think it’s already coming, if you’re looking at the way how the supply chain malicious code is growing. So the threats in the software supply chain is growing each year. I think we’re already there. Just I had the numbers before our discussion in front of me. And in 2024, we had 700,000 malicious packages, while in the previous year we had only 250. And if you’re saying that it is only, 250 meant the double of the previous three years combined. So the trend is growing exponentially, probably even more than that. So I think as you said, the reckoning is coming and we have to fix things.
Birgitta Böckeler: And that’s the one psychological side of it. And another psychological thing that I see right now as well is this culture of some organizations really going, everybody has to use AI or why are you so slow? You have AI now. And then on the other hand, kind of skeptics that say, “This is also stupid. Why are you even trying this?” So these kind two polar ends of something and then people in the middle, right? And I think we need cultures right now where the enthusiasts pull us up and the skeptics tear us down a little bit again to learn about this and to not be too fast and just run into all of these new security, vulnerabilities and tech vectors and so on.
But at the same time, also not ignore it and say this is going to go away because it’s not going to go away. So we have to all use it to figure out how to do it responsibly. And I think there needs to be a balance of this and the hypers and the skeptics have to work together somehow.
What To Assess, Adopt or Hold From Adopt According to Thoughtworks Technology Radar [38:59]
Olimpiu Pop: Okay, thank you. And to wrap up, given that you are a Thoughtworks representative and the technology radar from Thoughtworks is the north star of a lot of the guys in the field, it was even copied by so many. Let’s try to put on the quadrant some of the things in the AI space and see what should we try and not. And let’s see, coding, agentic coding, how do you see it? Should we hold on it, try it, assess it? How should we embrace it?
Birgitta Böckeler: Yes, I was just thinking I could actually just check what we have on the radar right now. So thanks for the plug, right? The ThoughtWorks technology Radar where twice a year we put together our snapshot of what we see right now on our projects and put it into these rings like adopt, trial, assess and hold. And we don’t have anything in the “adopt category” in this particular space. We do have in the coding assistance space, I think GitHub, Copilot, Cursor and Cline, for example, are in trial right now. So trial is the ring where we put things that we’ve used quote-unquote in production, which in the case of coding assistants means like we are actually using them on projects with our clients for production code.
Windsurf is also on there in assess. There’s also tools like you were talking about Replit and Bolt and all of that. We have v0, which is something from Vercel right now and Assess as well because some of our teams have tried it. We have a few things in the whole ring, which sometimes means don’t do it, or it can also mean just proceed with caution. So one of those is complacency with AI generated code, which I’ve talked about quite a bit just now. We also have in hold replacing pair programming with AI. So Thoughtworks has actually always been a big proponent of pair programming.
And while AI agents can cover some of the pair programming dynamics so that you actually have two brains instead of one, now with the agents, there’s maybe also one of this like, the agent is doing the tactical thinking and I’m doing the strategic thinking. So it does cover some of those things, but pair programming is actually a practice to make the team better. So it’s all about also collaboration and collective code ownership and context sharing and all of those things. So we don’t think that can be replaced with AI.
And also, for some of the risks that we talked about, it can actually be one mitigation to pair program with AI. So to have a pair working with the AI assistant, that can also be a really interesting technique when you have juniors and seniors on a team and you actually want to see how each other, how you’re using the assistance and learn from each other about that, right?
Olimpiu Pop: Okay. So by our whole conversation, it feels that we are not redundant just yet. Maybe tomorrow we might be redundant, but now it seems that it is more important than ever to use our brains, push us in the right direction, and then to find the ground floor, the common ground for us as individuals into skepticism. So playing nicely and careful with everything, but also to embrace the change. Would that sum it up?
Birgitta Böckeler: Yes, that’s good summary.
Olimpiu Pop: Well now, Birgitta, Have a nice day and talk to you soon.
Birgitta Böckeler: Thank you for the conversation, Olimpiu.
Mentioned:
.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.