Transcript
Daniel Hewlett: I’m Daniel Hewlett. I’m a principal AI engineer at LinkedIn. I was the AI tech lead for this Hiring Assistant project.
Karthik Ramgopal: I’m Karthik. I am a distinguished engineer at LinkedIn. I’m responsible for all the generative AI applications and the generative AI platform at LinkedIn. I’m going to start off this talk talking about how we started. In 2022, 2023, ChatGPT was the rage. All our execs were like, we need to do something with GPT. It’s super cool. We did not know what to do. We said, let’s start with a really simple product. We built this product called collaborative articles, which is very simple in the sense that we ask GPT to pick a topic, generate an initial article, send out notification to experts on that topic, and ask them to collaborate. Again, use GPT to enhance the article. Effectively, the article was collaboratively iterated and improved on. The reason we picked this use case was very specific.
Firstly, we wanted the simplest integration possible, essentially, prompt in, string out, nothing fancy. We had hardly any Retrieval-Augmented Generation, simple Retrieval-Augmented Generation. Most importantly, we did not have any user input. Everything into the prompt, there was zero user input. It was all like essentially the system generating itself. We obviously had no memory, simple system. Everything was done offline, so we did not have to worry about latency, capacity, all these nasty problems. We used GPT-3.5 to start with, and GPT-4, simple product. We learned a lot from it. Since then, it’s been sunset.
The Coach, and Agent Era
Then we come to what is called as the coach era, where we built a bunch of conversational assistants. Here, we decided to up the ante a little bit. We started having chains of prompts. Typically, we would have an intent classification/routing prompt, followed by a planning prompt to pick the right set of tools to execute, followed by the actual synthesis prompt. We started using Retrieval-Augmented Generation pretty heavily here, connecting to a bunch of internal systems. We obviously had user-defined input.
As you can see, people are asking questions. It’s giving responses, which essentially meant we had to ramp up our trust defenses quite a bit to guard against malicious input. We needed conversational memory here because people could ask questions about stuff they had already asked. We had to do online inference because it’s an online product, so latency and capacity were important to conserve. Here we used a combination of OpenAI as well as in-house models.
Since last year, we are in what we call as the agent era, where instead of having just conversational assistant, we moved to more of task automation using agents. Instead of prompt chains, we have prompt graphs now. These are graphs of prompts which get executed. We have pretty advanced Retrieval-Augmented Generation, which we will cover. We have these abstractions for skills, which is slightly more advanced than tools, again, which we’ll cover later. We have an explicit concept of human in the loop, where if the agent cannot do something, it escalates to a human.
After that, the human responds. The human is actively in the loop. We have conversational memory, which we had before, but we also have this cool thing called experiential memory, where the agent learns from its experience of interaction with the human and it stores things. Of course, right now we have online, offline, and nearline inference. Again, we’ll cover why. We are using a combination of different models right now.
From Coach to Hiring Assistant
Daniel Hewlett: This is a product which is very new and we’re very excited about. This is the CEO of LinkedIn announcing the Hiring Assistant just back in October at our Talent Connect conference, which is an event we hold for recruiting leaders and people in the HR space every year. That’s where we debuted this. It’s currently in a limited charter with a select group of our partners, enterprise customers. I’ll say more about that. I want to take you on the journey a bit for how we develop this new Hiring Assistant, and what it actually does and looks like, and some of the things we learned along the way. To do that, I’m going to go back a little bit so you have some context on the problem that we’re trying to solve here, because it may not be something that you have used yourself. How many folks have ever received an InMail from a recruiter on LinkedIn? How many people have used Recruiter Search to find someone and send them an InMail?
Most of the folks that are using LinkedIn as LinkedIn members are on the receiving end of these InMails, and it’s one of the ways that you can find new jobs and new opportunities through our platform. Today, we’re going to talk about the LinkedIn Recruiter tool and the generations of AI assistants and the new agent that we built on top of it. I just want you to understand a little bit from the recruiter perspective what this has looked like. LinkedIn Recruiter has been around in some form since 2008. It was already around for 10 years when I joined the company in 2018. I’ve been working on it since then. For a long time, this had many new features, many iterations, but there was a basic way that you worked with it.
On the left side, you can see a set of search filters. As of today, there’s over 40 search filters and facets you can put on there. You can also create your own custom ones as it shows there. You can do text searches. You can find people that are open to work. You can search about connections in your network, things like that. It’s a real rich tool that professional recruiters use every day to do their jobs. The classic workflow we’ve had most of this time is illustrated in the schematic there. You as the recruiter, you would sit down and start constructing a search query. You would think about, who’s the kind of person I’m trying to find? What kind of skills would they have? What kind of titles would be good for the position I’m trying to source for? You’d build a structured query like on the left there using all those facets and tools. You’d get back some results in the main search area. It’s not shown here, but it’s a list of LinkedIn members.
Then, our AI would be behind the scenes finding those candidates and prioritizing them for you. You would review them. Then based on that, you would refine your search query. Maybe there’s just too many. You want to add some more filters. Maybe they’re not the right kinds as you have to think about what to do there. It’s a very human-driven process.
As Karthik mentioned, last year, we took a step forward to our AI-assisted version of Recruiter Search. We rolled out AI-assisted search. In our release, we called Recruiter 2024. In the first half of last year, that became available to all the users on our Recruiter product. What we did here, this is in that coach copilot space, we added to that workflow from before, so brought the same flow back. What’s different is, we added AI in new places in the flow. Wherever you would go in, previously you had to do something, you could only do it manually. You could now use natural language to talk to an AI assistant and have it do that for you. I’ll show you some examples.
Most notably, perhaps, instead of having to interact with all those facets and widgets and navigate through all different things yourself, you could just say, I’m looking for an accountant in New Orleans. It will start and populate that. It makes the UI more accessible to a broader group of people, but still fundamentally, once you do that, you have to check the output, make sure that the query is what you want. You have to review the candidates yourself and so on. There are some opportunities. We learned a lot from this and they really informed our agent design.
One of them is, when you’re thinking about building agents, you’re going to prioritize the most time-consuming cognitive tasks. For recruiters, that task is reviewing the candidates and going through potentially hundreds of candidates that they’re looking at, looking at profiles, looking at resumes, trying to figure out what it is that may make this candidate meet the qualifications that are required for the role or not. Trying to identify those candidates to shortlist and prioritize. That’s something that takes a lot of time for recruiters. We’ve always been helping in a way with our retrieval and ranking system. That’s a search stack. It’s got ranking models in it and so on, but it’s behind the scenes. We’re not working directly with that. We didn’t solve that with our 2024 AI-assisted search workflow.
Another opportunity is that because we were taking natural language and translating it into the format of the tool that we already had. This is an example of taking the space of natural language, which is very open-ended, natural language semantics, you can say an extremely wide range of things. Then we’re trying to put that into a constrained logical structure. If you’ve worked on that kind of problem before, you know that there’s a lot of information lost when you do that. There are things that you’re just not going to be able to render properly into a structured format. That was another lesson we took in there. You’re going to see how we address that in agent.
We’ll talk just a little bit more about how we were building Recruiter 2024 and how that informed our agent design, which you’ll see. This is an example of that natural language search I was talking about. Here I’ve said, find me accountants in New York with experience in bank reconciliation. It’s done the job of translating that into those structured facets. Everything is standardized based on taxonomies of the entities that exist in the job space. You can now edit that structured query just the same as you could before this assistant came out. We developed this originally, the way you would when you’re prototyping. You would take that text, you’d feed it into an LLM block. Meaning, you have a prompt template, you have an LLM, you might have some RAG in there to retrieve examples for few-shot prompting. That’s a single LLM block.
Once you go past the prototype phase, you really want to make sure that the quality of this system is top-notch. You want this to be something people who are using this tool as part of their jobs, professional recruiters, can use every day, and they don’t have to think about, is the AI going to do the right thing or not? If you have just this one block there, even if you introduce things like, I’m going to use examples to try to show it how to behave, to fix a particular quality issue, that introduction of that example could actually affect the behavior of something else in your system. This is the way LLMs work. They’re non-deterministic. The whole of the context that they operate over influences the output. This leads to bottlenecks in scaling when you’re trying to develop the system, and put a team of engineers behind this.
What we ended up doing here was going to a more modular architecture where we had a manager or interpreter, coordinator, which would take your input and then figure out, what kind of specialized areas, what sort of filters am I going to need to work with here? I’ve shown this schematically. This is a very simple way of showing it, but different sorts of specialists, which were themselves new LLM chains and blocks that someone could work on and improve in isolation.
For example, we can make sure that the location one really understands all the nuances of locations, NY could be New York City, it could be the New York metropolitan area, it could be New York State. We can give it instructions around that without any of that influencing titles, or companies, or any of those other things that don’t have to do with that. You can take each of these modules, this is a common paradigm in software development, but you can apply it to the construction of these GAI chains.
However, I do want to mention one thing, there’s a tradeoff with this, which is that now instead of one step, you have to go through two or more steps, and each of those steps can potentially have errors because LLMs are after all statistical at the end of the day. Your errors can compound across these stages. If you have 90% accuracy in both stages, you’re going to have 90% of 90%, that’s 81%. The reason to do something like this is because you think you can engineer each of those sub-components to a very high level of quality and thus have an overall better system by applying more either different techniques or more engineering effort across these different tracks.
Building Hiring Assistant
That’s some context from our coach era. Now I want to show how that showed up in building Hiring Assistant, and then how we address some of those opportunities that we showed there. This is a brief video. What you’ll see here is a demo that we announced at Talent Connect that shows the MVP version. What you’re doing on the first step there is you’re putting in your hiring intent. You could do things like attach a job posting, attach some notes you’ve made about the candidate you want to hire. Maybe you can even attach the resume of an employee who’s leaving and who you need to hire a replacement for.
Instead of building a query for you, which means you need to understand the whole Recruiter Search UI and all of that, we’re translating that intent that you put in there into a set of qualifications and other information about your role. Based on those textual qualifications, which you can read and understand as a person, and preserve a lot more of the meaning. We’re not losing that meaning of what you wanted by having to go and turn it directly into that logical structure. We can apply LLMs at different stages here. We can also search for candidates using that natural language intent. We can evaluate based on that natural language intent, whether there’s evidence that candidates meet your qualifications. All of that is enabled because we’ve changed the interface with the user so that we’re no longer interfacing through the traditional search filters, we’re interfacing through a natural language description of your hiring intent. That’s driving a lot of the new features here.
Remember how we talked about modular design in the design of our copilot kind of product, Recruiter 2024 AI-assisted search. Now here in Hiring Assistant, informed by that, we take a similar strategy from the outset, but instead of building particular LLM blocks, like I was saying, like a prompt with some other data around it, now the modular components are themselves sub-agents. In one of the talks, they mentioned this type of design. What we’re doing here, similar to the previous example, the copilot example, there is a supervisor, but here it’s an agent that can treat the other agents under it, the sub-agents, as tools that it can invoke and pass information to, and they can return information back to it, and it can act on that.
Everything we did before, that whole flow chart I was showing earlier, that’s all now one skill under one agent. It’s the same principle, but it’s applying hierarchically. This is also going to enable the same things where we can develop more effectively and evaluate things in parallel. I’ll talk more about that. I’m going to show you an example of how this works in practice. The user gives an input, it goes into our supervisor agent. The supervisor is going to determine, what is that intent about? What is the user trying to do here? Then based on that, it’s going to send the request, possibly with some other information to another agent. It’s going to send some information to the intake and say, I’ve got a request for you intake. Here’s the information they gave. Here’s some other context about the request. Then the intake agent will take over. It might use some skills to look up user preferences, maybe things that the recruiter typically does, or other things associated with this hiring project.
Then, based on that, it can generate a set of qualifications, which is what we were looking at in the video that I was looping. That cues off the rest of our flow. I’ll just show another example. I’m just going to go through this one just to show you a more complex flow. That’s a basic one I just showed. Here we’re going through and using our sourcing agent, which is going to read those same qualifications we just generated. Then it’s going to use other skills like the ones we talked about to generate search queries, and then feed those through our Recruiter Search index. We’re leveraging everything we’ve built before, but now it’s part of an agent, and this agent can run as many queries as it wants. It can run multiple different queries in parallel, explored the search space in different ways to better mimic what a recruiter actually does. It doesn’t run one search and say, here’s the results. They’re going to run that feedback loop that we were talking about earlier. That’s where we’re going with this.
Ensuring Agent Quality in Hiring Assistant
I mentioned earlier that quality is really important, and one of the reasons to have a design that gives you a number of sub-components is so that you can really isolate the quality of each of those sub-components as you’re building the agent. Because, remember, we’re building this whole thing up on top of what we already built, but all those sub-agents and that supervisor agent, all of that is new. We were building that up in parallel last year to get this out in time for Talent Connect. What you’re seeing on the right is a schematic flow of the data through our system at a very abstracted level. You can see each of the different agents I described, and they’re some of the GAI components or AI components that they use called out there in blue. Each of those has a kind of input and a kind of output format that we can evaluate. Now they may go through a lot of different internal steps to produce that output, but we can still evaluate in this way. We went through a common quality playbook for all of these different agent components that we’ve built in parallel.
The basic game plan here is to make sure that we can iterate on the quality of each of them in parallel and independently, and build this thing from the bottom up. Starting with each of them, we would define our rubrics for humans, use those as quickly as we can to transition to LLM driven automated evaluation, because really the bottleneck in developing these tools, the unlock is being able to iterate quickly to improve quality.
The more we can move away from having to use a lot of human annotation or having to, for example, have a correct reference output for every possible example we’re evaluating, moving to reference-free evaluation. Those kinds of techniques enable us to move a lot faster and to iterate more quickly. That’s really how you can build this up. We’ve got multiple dimensions of productivity. One is that we can develop each of these agents in parallel because they’re modular. We can also measure their quality in isolation. Then, of course, as we get later on, we’re going to have to start stitching all of this together to build up the overall agent there.
The last thing I want to talk about on the agent design is of course LLMs and how we proceeded there. The basics of this, of our LLM choice, if I had to boil it down to a simple binary would look like this. When you’re building up an MVP, a new product, you want to iterate, especially something like this with where we’re changing the product interface. I mentioned we’re moving from a traditional structured search to more of a natural language-based search. You’re going to be able to iterate quickly because you want to try out new product designs and also new prompting, new LLMs, new data as part of your chains and everything inside of these agents. That means you want the best instruction following to be the most agile. You’re going to want to go with something like the most state-of-the-art LLM. During some of this development, it was GPT-4o, for example. Then, one thing about this product is we also have areas that are high scale.
For example, if you think about that intake, that’s one recruiter talking to one AI. It scales with the number of recruiters. If we think about evaluating candidates, which is what I’m going to talk about next, for a given job, you might have thousands of applicants, or for if you’re sourcing, you might run a search and come back with hundreds of results. If we’re going to evaluate all of those, that’s going to scale with the number of candidates. That’s orders of magnitude bigger scale. There we want to use fine-tuned smaller LLMs. This is showing how that candidate evaluation task works. There’s a tension when you use fine-tuned LLMs to develop something new, which is that fine-tuning works best when you have a well-defined problem statement and you’re just trying to optimize really aggressively to drive up quality within that problem statement. What if you’re changing the problem statement along the way? That’s why we wanted to look at general-purpose instruction following models.
At LinkedIn, we’ve done some work that I want to share to develop a way to try to have both sides of that equation. This is just showing you how it works. We’re actually, for each candidate, taking the qualifications, the profile and resume of the candidate, and then producing an evaluation that says, for each of the qualifications, whether the LLM was able to find evidence for that. If so, it gives you citations that show you where that supporting evidence comes from on the profile or the resume. It has receipts.
In building fine-tuned LLMs for this, we’re not starting from scratch. At this point, we had already built out applications on top of generally available, general-purpose LLMs, and they work reasonably well in jobs domain. The reason is that a lot of job-related data is public. For example, if you go through the Common Crawl corpus, which is used for training GPT and a bunch of other LLMs, you’ll find over 4 million job descriptions that can be identified based on metadata associated with them. That’s from this source here. That means these LLMs are already quite well-versed in things like qualifications and so on. We also did some experimentation here, and as you might expect, domain-specific fine-tuning can still help.
Here, the approach we tried for this at LinkedIn in partnership with our core AI team at LinkedIn is called multitask instruction tuning. Here we’re taking open-source LLMs, and we’re fine-tuning them on a set of LinkedIn economic graph data. It’s not actually this specific task, which is interesting. We’re trying to train them to preserve their ability to follow instructions, but to understand the type of data LinkedIn has, so to get even better at understanding job postings, at understanding member profiles, posts on our feed, and so on. There’s a blog post there, which you can read more about that. I’ll just show the flowchart that shows how this worked. EON here is the name that our core AI team gave to this language model. I want to just show the second steps there.
For domain adaptation, that’s showing how we train it on a number of different tasks, and we’re preserving the ability for it to follow instructions. What that means is we end up with a system where it can outperform models of a similar size that are coming from open source, but you can still put in different instructions you can iterate with on your product as your product experience is evolving. This is a pretty new area, I just want to share that because it’s potentially a way for us to have a bit of both. Of course, the LLM that comes out can still be fine-tuned or distilled. These are all things we’re exploring very heavily right now, how to get even better performance for this task.
Hiring Assistant Tech Stack
One of the things that made this possible that we’re going to hear about next is our LinkedIn agent platform. Actually, all of these sub-agents are implemented on top of a common tech platform. Hiring Assistant itself is on top of that. Because of these shared foundations, just to show an example of how important this was, we were actually able to implement almost in parallel another version of the Hiring Assistant, which is built out of some of the same components, but is targeted at small and medium businesses who have a different set of needs. For example, they may need more help with creating the job description itself, or trying to figure out which types of candidates to reach out to, whereas professional recruiters might be more opinionated.
LinkedIn’s Agent Platform
Karthik Ramgopal: Stuff we will talk about in the agent platform should be broadly applicable to all of you who are trying to build agents. It all starts with prompts. You can think of prompts as really simple strings which you feed to the LLM, but it’s actually way more complicated at scale. You typically have a prompt template with placeholders inside which you fill in content which is specific to each prompt at runtime. You also want to have some guardrail instructions to prevent abuse for trust, responsible AI. You also want to ensure that when you have too many prompts, you want to have some organization like namespacing, use case-based segregation, versioning if you’re trying to roll out a new version to try and see how it performs.
Rather than have the developers do all of this by hand, we encapsulated everything into a prompt source of truth service. It’s just a simple facade service which essentially manages all of this for you. Here is an example of how it looks like. We have an app, at the top level you have namespaces within the app, you have use cases under it, and you have different versions. You have various ways to switch between versions, select prompts, load prompts, it gets registered. We have backward compatibility checks running. A bunch of things come for free so that the developers do not have to do this manually.
The other important thing is, every day a new model comes out and you want to quickly switch models, you want to abstract out LLM inference as much as possible. We have a choice of models. We use Azure OpenAI models. We also use on-prem hosted open-source models which are further fine-tuned. What we saw is that the OpenAI chat completions API is pretty much a standard right now for LLM inference. We spun up an abstraction which exposes this API. After that we have a configuration-based system via which you can go and you can change your LLM on the fly and call different LLMs.
The other thing this allows us to do is because we have a single chokepoint for LLM inference, we can do things like quota management because LLMs are expensive, GPUs are limited, so you need to have quota enforcements. You can also do runtime moderation for trust and safety, or responsible AI purposes. Having this layer really helps us. Again, we put all this into an LLM inference service, which is just a proxy with some additional syntactic sugar as you can see.
The other thing is application framework. You do not want people writing prompts and calling LLM inference manually. You can, it’s just not going to scale. Your code is going to look very clunky. We had to pick a framework. What did we pick? We picked LangChain and LangGraph for a variety of reasons. It allows us to chain prompts, interact with memory, invoke skills, which we’ll talk about in a bit, and do LLM inference. What we basically did is we built some really nice adapters and helpers into LangChain and LangGraph so that our developers get all these out of the box and you do not have to do this manually. Anytime you pick a framework, pick any framework based on your choice, ensure you implement the right abstractions so that the developers don’t end up doing it again and again. Here is a 100,000-feet view of a typical control flow inside an agent. You have a prompt, it goes through a planning phase. It generates a task, it gives it to a single task agent, and the result comes back. It’s an iterative loop. It may need to be replanned, re-executed.
Ultimately, when you’re done you get the response back. You actually have to do this at scale. You can’t run this on a single machine. How do you scale this? Classic distributed systems problem. Apply horizontal scaling to it, spin up lots of instances, and let all these instances process everything in parallel. Classic distributed problems return with a twist, the twist being that everything is very non-deterministic. What are these problems? You have to have state consistency which is very distributed. You have to handle traffic shifts, if you shift traffic out of a region, go to another region. At LinkedIn we do this a few times a day. You have to handle host outages. You’re running something, the host goes down, you have to resume it on another host. You also have to handle ephemeral failures because stuff can fail all the time, entropy is real.
Messaging
How do you solve all these things? We decided to use messaging to solve all these things. What we said is that, all the communication between users and agents, and agents and agents should be modeled as a message. What do you think the message contains? Actually, if you look at the message, it’s a combination of structured and unstructured data. The structured data is information which is already available as structured data. Let us say we have an entire job in a structured way as a JSON or something, we just send it in. Why? Because if it’s already structured, why go through the non-deterministic path of converting it into a natural language and potentially losing information when you convert it back into structured information. You may use it to interact with an API.
It also contains a non-structured component which is freeform text, which is essentially instructions given to the agent asking it to do something with it. No, we don’t send prompts. Sending prompts in messages is problematic. What this enables is distributed agent orchestration where you have a user or agent sender. We send it to a messaging platform. Yes, it’s the same thing which powers LinkedIn messages, literally the same platform. We have a messaging service which essentially passes it into a database. It tries to deliver through a delivery service online through an agent orchestrator service. In case the delivery fails it keeps attempting a nearline retry. In case of traffic shifts we use a nearline retargeting service again through a similar mechanism of using a local queue and an aggregate queue retargeting and delivering. We are able to piggyback on a lot of the resilience mechanisms we have for delivering the messages on LinkedIn which are subject to exactly the same problems with an agentic workload. Ultimately, it reaches the user or agent receiver.
The reason we put the orchestrator in between is because we did not want people implementing all these messaging abstractions. We just wanted them to implement RPCs. You just implement an RPC endpoint, we’ll call you, under the hood it’s all a message. Execution concurrency is very important. You don’t want it to all be very linear but sometimes you want it to be linear because context matters and reference matters. We have this concept of threads which is again really similar to messaging threads, where we will do sequential First-In-First-Out delivery within a thread. If you want parallelism, you just create multiple threads. This is how you do sequential versus parallel balancing. We use distributed storage to synchronize state. How does the agent remember things? LLMs are stateless so we have to build some form of memory.
Our memory is actually scoped and layered. We have working memory which is task context within the context of a particular task. This is your conversation history as well as that particular task. We also have long-term memory, which may be episodic or procedural. Episodic is what happened in a given episode or a session of interaction. Procedural is recording information about actions taken by the user. We also have this interesting concept called as collective memory, and it’s scoped and layered because you may want to learn things at various levels of granularity.
For example, in the context of the hiring agent, what does a particular recruiter do? What do all recruiters in this company do? What do all tech recruiters do, for example? As we can see, various levels of learning. You have this, you also have to retrieve from it, so context lengths are limited inside the LLM. You can’t throw everything inside the memory into it. Though with Llama 4 the promise is huge context length, so you don’t need precise retrieval, but let’s see. Even if you had huge context the problem still remains that it’s really slow and it’s more expensive to process, because, ultimately, you’re bound by the number of tokens.
In general, you will see more hallucinations with a longer context. We have three common techniques for handling this. You do like most recent N for conversations because most likely you’re referring to something which was said in the recent history. You also do like summarization. This is useful if you’re dealing with something really old and you want to summarize it, quite like human brains. You don’t remember every detail of what happened a year back. You remember what happened yesterday probably really well. Same thing, it’s an organic form of compaction. One really useful technique which we use heavily is semantic search via vector embeddings so that you can retrieve the most contextually appropriate information.
How does the agent do things, which is skills. You may have heard of this called as tools, but skills are way more than tools because skills encapsulate both what and how. It takes various shapes and forms. You can do RPC calls, database queries, it can be prompts, it can even be other agents. We are adding support for MCP right now, Model Context Protocol, which is all the rage right now for calling tools. Originally, we had implemented these pretty manually, and later we realized that it’s not going to scale to implement it manually, so we inverted it. We said, rather than us implementing tool calls into every LLM application, let’s build a central registry which we called as the skill registry, and let’s have the source of truth expose these, and we will register it in a skill registry and we will retrieve it at runtime.
Right now, rather than every client build tools to call endpoints or other skills, every tool owner or every service owner or every agent owner exposes an abstraction, quite like MCP actually, although we predated MCP, which is why we built this. We have a lot of services at LinkedIn exposed over gRPC, so we built some really cool automation for gRPC services, so that developers only have to write some options and stuff automatically gets registered and works. We also built a UI to go and search for skills. We have a text-based search and an embedding-based retrieval-based search. We also built LangChain adapters for you to go and load these skills dynamically.
How does this all come together at runtime? You get a task. You say, search for a mid-level engineer. Rather than hardcode the tool, you dynamically figure out from the skill registry what tool is most appropriate, fill in the arguments to it using the power of the LLM, and after that execute it. Your agents have an incredible amount of agency with this approach, since you’re exposing all the tools to them effectively as opposed to you hardcoding and controlling the flow. If you still want to control the flow, you can, no one’s stopping you. It’s just that if you want to give them agency you can. Observability is another thing which is really important, so this describes our overall flow. Classic distributed systems, we have various forms of observability, you have traces, you have events, logs.
The problem with these agentic systems is that they are inherently non-deterministic so you can have various different flows going on all the time. Your classic observability doesn’t work that well. It’s really hard to get insights. We had to build various forms of sophisticated aggregation which is then fed into industry standard systems like OTel, in order to generate a trace. We again do further processing on it in order to generate analytics dashboards, which developers can consume and can look at it. We also have two use cases for it, one is when you’re debugging locally to see something, when you can visualize what happened to a single invocation. The second is in production at scale to look at an aggregate set of use cases.
Lessons Learned
What are some of the important lessons we learned? The first is, standardize a bunch of these common things, automate boilerplate, and democratize access to your developers. You do not want them to be doing the redundant parts of the job again and again, when building agents. You want them to focus on the actual job of building the agent, just like any other framework. Another very important thing, most of the time in an agent, if you’re doing simple workflow automation, you may not even need an LLM. You don’t even need AI, procedural code works a lot better, a lot more stable, faster. Please be careful. Only if you need reasoning, go to the left, otherwise stay on the right, which is just use procedural code. Again, if you have a quality scale or a latency constraint, try to pick a custom model, otherwise simply use a commercial cloud-hosted model, because the overheads simply aren’t worth it. Here is the model customization pyramid.
Again, use the techniques at the bottom of the pyramid as much as you can, and try to use the techniques at the top as less as you can, because it becomes really expensive, hard to maintain, the models are evolving quickly, so you need to be very thoughtful about what you’re doing. What this essentially means is, as long as you can, use Retrieval-Augmented Generation, prompt engineering, cache-augmented generation if you want, and start doing various forms of fine-tuning or pre-training or ground-up training, only if you really need to. It is still software deployed to production, it’s not a prototype, so please build for availability, like resilience and scale. You need robust evaluations like Daniel called out.
Trust, privacy, security, responsible AI, all these things are important. You also need observability. Without observability, it’s really easy to fly blind when things go wrong, because everything at the end of the day is very non-deterministic. This is very important, so this shows an elephant dancing. The space is changing really fast, so it’s important to ensure that whatever you build can also adapt really fast. Two examples I’d like to take. We picked LangChain and LangGraph, however, the way our software is built, it’s essentially layered. We have these lower-level platform components and we have LangChain and LangGraph adapters.
Today we like it, tomorrow some other framework comes out or we want to try something, we have the ability to switch relatively cheaply without changing the entire ecosystem. Another example is MCP was not a thing last year when we built this, but right now it is a thing, so our skill registry can be integrated into MCP. In general, always try to buy, don’t try to build. Only try to build if it’s simply not available, because, again, the space is moving really fast, new things keep coming out every day. If you invest a lot in your own stack, it may be really hard to adopt new things.
UX lessons, so this is also interesting. A lot of apps just say, stick a text box, have the users enter things into it. Sounds very ideal, actually it doesn’t work, because it’s a lot of overhead for users to remember, “I have a text box, what do I type into it? I don’t know if it’ll work or not”. You sometimes have to provide finer-grained controls. Here you can see an example of our AI-enabled message automation. Here we have a text box for the humans to edit the AI-generated message, but if you see the personalization filters, rather than have the human type it out, it’s easy to provide some clickable boxes so that they can click on it. Progress indicators. A lot of these AI agents do heavy work, it’s asynchronous. If you’re going to show spinners, or if you’re just going to show a static UI and the user keeps waiting, they will get irritated. One of the lessons we learned is that, only if you’re going to respond reasonably fast, show these spinners, show these progress bars, otherwise it’s better to use an explicit async flow with some ETA, and use an async notification-like mechanism to inform users when the task is done.
Questions and Answers
Participant: In a skill registry, if agents by default have access to all these skills, and it would be easy for a developer to register a skill for a service which indirectly has the ability to say, destroy data, now every AI agent implicitly has this power that no one planned for. Talk a little bit about the security of the skill registry.
Karthik Ramgopal: Every agent does not have access to every skill. The way it works is we have this concept of service principles at LinkedIn, which is used for determining which client has authorization to call which endpoint. Those are transitively applied even for agents. If you essentially build an agent, you have to have the service principles registered to be able to call the skill service principles, which is how we enforce this.
See more presentations with transcripts
