Transcript
Introductions [00:27]
Srini Penchikala: Hi, everyone, my name is Srini Penchikala. I am the lead editor for AI/ML and the data engineering community at InfoQ website. And I’m also a podcast host. Thank you for tuning into this podcast. In today’s episode, I will be speaking with Nikolaos Vasiloglou, who is currently the vice president of research ML at RelationalAI, the company behind the knowledge graph processor for the data cloud.
Technologies like generative AI, large language models or LLMs, and the retrieval augmented generation or RAG, have been getting a lot of attention in recent years. Graph databases and knowledge graphs add a new dimension to developing RAG-based applications, and make those applications more powerful and valuable in the data analytics space. Nik and I will discuss the topics of knowledge graphs and the GraphRAG in this episode. Hi, Nik. Thank you for joining me today. Can you introduce yourself and tell our listeners about your career and what areas have you been focusing on recently?
Nikolaos Vasiloglo: Hi, Srini. Thanks for hosting me today. So, yes, as you mentioned, we live in excited times with GenAI. I think there’s a big elephant in the room right now, how we get our returns on investment in the enterprise world. And this is exactly the topic, the area that I’m working on. I’m trying it from different angles. As you know, I have strong ties with academia. I go to conferences. And at the same time I interface with Fortune 500 customers, that they’re trying to solve problems with GenAI. So, the biggest question is how can we create value and at the same time at a reasonable cost. So, cost is part of the equation. And GraphRAG seems to be one of the directions that starts creating returns. In fact, I think it’s leading us into… As we break down the problem, we identified smaller problems that can be reused in other places as well. So, that’s what we’re going to talk about today.
Terminology [02:23]
Srini Penchikala: Sounds good. Yes, let’s start with those topics. So, most of our listeners know about what an object graph is and what a graph database is, but they may be new to concepts like a knowledge graph and obviously graph-based retrieval augmented generation. So, can you please define these terms? What is a knowledge graph? What is a RAG, and then what is a GraphRAG?
Nikolaos Vasiloglo: Yes. So, first of all, knowledge graph is an old idea. It has re-transformed. Has gone through winters and summers. I think the most recent summer was in 2012, when Google introduced this concept as things not strings. Where basically there was a time instead of just presenting the search results where you would say Marie Curie, they will create an info box that had the information organized based on some entities, and relations, and properties. Date of birth, husband, born, died, and all that stuff, organized into a more structured form.
Now, the best way conceptually to describe the knowledge graph is it’s something that a friend of mine gave a definition, said that knowledge graph is a language that both humans and machines understand. That’s like Michael Bromley. I used to work with him. And after hearing different discussions around that, that’s where he ended up. And I think that’s why they’re becoming so valuable these days, because we have language models and humans trying to work together. And that’s the glue.
You asked me about RAG. It was a term that was coined not that long ago. It’s called retrieval augmented generation. And basically, it means that when you’re trying to answer a question, it comes from the question answering domain. You’re basically trying to create a context. You’re asking a question, a language model. A language model might know about this based on the data that was trained, but it might not know, so you want to provide the context out of which you want to create the answer. So, everybody’s familiar with search engines. The search engines you would ask a question, and they will retrieve the text and you had to derive the answer. Given the text, you’ll say, okay, that’s what the answer is.
Now, there’s an extra step that you take that relevant context, the web pages that the search engine would return to you, and you put them on a language model and then it forms the answer. Of course, the biggest difference is that search engines in the past used keyword search in order to retrieve the information. Now they use something called embeddings. It’s a way of vectorizing, condensing a whole document into a single numerical vector. And of course, that’s a much better way of retrieving relevant pieces of text to answer. What was the third concept you asked me to define?
Srini Penchikala: Oh, the third one is the graph based RAG techniques.
Nikolaos Vasiloglo: Yes. GraphRAG is something that keeps changing. I think it’s about a year old, if I remember. The concept started maybe February last year from Microsoft through a blog and then there was a paper. I think that’s what it was. And the way that RAG is trying to retrieve information is by vectorizing your questions, converting your question into a vector and then trying to find similar vectors from texts to retrieve them and then answer them. People said, “Well, what if we’re actually recognizing some entities that exist in the question?” And if we have built a knowledge graph, the knowledge graph is basically statements, entities that they have, properties that they have labels or they have relations. Can we find the entities that are this question and go back to the knowledge graph, find them with exact keyword match? Because now the entities are always the same.
Nik Vasiloglo is an entity, is a concept and it’s everywhere the same. And see what we know about this entity. And instead of retrieving just the text with vectors, retrieve all the texts that it’s associated with these entities and then answer the question. So, GraphRAG was initially conceived in order to do a better retrieval of the text through the knowledge graph. Now, this has changed through the time of the years. And I will explain later.
GraphRAG [06:31]
Srini Penchikala: Since GraphRAG has been getting a lot of attention lately, can you talk about some good use cases? What type of applications are good for using GraphRAG and what applications are not good candidates for using GraphRAG?
Nikolaos Vasiloglo: Everything has to do with question answering. This is kind of like the holy grail and that’s why you might see it right now. A lot of companies like AutoGluon from Amazon, I don’t think it’s open source anymore, a project from Amazon. It’s a service. They talk about generalized RAG. So, basically, they mean I want to ask you a question, any question and get an answer. And questions can be as simple as point questions, like I’m trying to retrieve something. But when was Alexander the Great Born? You have to find where it is mentioned and retrieve that. And then you can have more complicated questions where you’re asking about something. Alexander the Great and Napoleon, what are the common characteristics of Napoleon and Alexander the Great?
Now, there’s probably not many texts talking about that, but there’s definitely a connection. And by doing an entity extraction from all the documents about Napoleon and Alexander the Great, maybe you can find paths from that graph that they connect them. Maybe they went to the same places. Maybe there was a general of Napoleon who studied or had… I don’t know, just put that fictionally. He was also Greek and Alexander the Great was Greek. So, this is a question that requires a little bit of a longer reasoning path. And in that particular case you have to traverse a graph. Now, there’s different flavors of GraphRAGs. And sometimes what GraphRAG is trying to do during the indexing is tries to find communities into the graph, bring them together, go back to a language model and say, “Well, these things seem to be connected, being the same community through entities or relations. Can you create a summary or produce new information about that, like distill new knowledge?”
Now, of course, there can be more difficult questions, which would be, I am Apple and what’s going to be the effect on tariffs in Vietnam on my sales in Europe? Or it can be, I don’t know, Stanley Black and Decker and there’s probably a factory producing something in Arizona and you say, if there’s a fire in Arizona, how should I change my supply chain? Or does it make sense to buy more slots at this price on this manufacturing facility? Now, that requires an integer program to be solved. And of course, it can be the question that, where was the closest planetarium to Atlanta in 1953? Now, that’s a question where it requires agents to go and search Wikipedia, find out, understand the geographic proximity. Starts, well, did Alabama have a… Go find like, did, I don’t know, North Carolina…?
So, if I was giving you this as a project, you would have to go, and search, and come up with an answer. Now we have agents. Agents could do that. So, I just started from a simple question, when was Alexander the Great born, and then we went to even… And it can even be, I’ll just close with that, find me a new material or a new antibody for a new material that has these properties or a new antibody? They’re systems. They still call them RAGs, because they’re question answering, that they will have to go and invent a new protein, a new medicine, a new product. So, the interface is the same. It’s always the Google interface query box. You just go and you type a question. Now, as you can understand, there’s a lot of complexity about how to answer different things.
Srini Penchikala: Yes, that’s good. So, definitely a lot of value that the graph brings to the RAG space. So, in terms of implementation, what are the differences between a vector based RAG solution and a graph based one? Is it just the traditional RAG solutions use a vector database and the GraphRAG uses a graph database, or are there more differences in that?
Nikolaos Vasiloglo: I think in principle, GraphRAG is built on top of RAG. So, first of all, let me say that for the majority of the questions that you’re going to see in the enterprise RAG could be pretty good. And sometimes there are some unfair comparisons between them. If the questions are very easy, you’re not going to see any difference between GraphRAG and RAG. So, vectorization is always needed, because even if I say what was the previous job of Nik Vasiloglou, there is no entity Nik Vasiloglou in your knowledge graph. It’s probably Nikolaos Vasiloglou and maybe there are some synonyms. So, there’s some kind of entity resolution that needs to happen over there. So, probably, what the system would do will be some vector search to find what’s the closest entity to Nik Vasiloglou, like do the entity resolution. So, you will still need vector databases.
There’s also one component that hasn’t been highlighted enough, which is the re-ranker, which is after you bring the facts, how do you decide what to keep and what to throw away, has a big impact. And the graph retrieval is another way of retrieving information. But I want to highlight something, a shift in the market for the audience over here that is more aware of RAG and other question answering solutions. In the beginning, people would build a knowledge graph and just use it only to retrieve text. They wouldn’t really use it for the facts. But let’s not forget that the knowledge graph is structured knowledge. It means you can store it in a graph database, which means that you can write a formal query.
So, when you’re analyzing for when you want to index information, let’s say something very popular in the financial industry, like SEC documents, there is some information that when you create a knowledge graph can be very well formalized, like what are the net revenues, the losses, the repurchases of stocks and all this stuff? Which means that just explaining to the users that they’ve never used the knowledge graph, just think about it that you can just take them and store them on a SQL table in a relational form. That means that when I ask you what was the average net revenues of Fortune 500 in Q3, it means that this is a question that you can transform it to a formal query no matter what system you’re using, whether this is SQL or this is GQL, or if you’re using a graph database, or any other language.
And you can go and execute it. And you know that this query, because it’s being executed on a database, as long as it’s formed correctly, because there’s always a chance you can form the wrong query. And it gets executed, it’s going to be accurate because databases are symbolic systems. There’s no hallucination there. Now, at the same time, if you ask the question on an SEC document, we’ll find how the war in Ukraine affected revenues or operations of Nvidia. This is not a question that’s easy. It’s not impossible, but it’s not easy to transform it into a formal query. There’s no way you can write your facts and organize them into a graph database or relational database, or a relational graph database that we do, that it’s easy to form, and verify and execute. So, in that particular case, you would probably need to bring the paragraph that talks about that. So, that’s kind of the right system.
Now, what we’ve seen is that companies, they’re asking for more and more accurate and well-structured knowledge graphs, because they would prefer to ask questions that can be transformed to a formal, like SQL or any other query language and use that. That’s something, because for several reasons. First of all, because they feel like this is something that they can audit. They can see the SQL query, if it’s SQL or GQL or whatever, and tell if it’s right or wrong. Second of all, because the knowledge graph that you’re producing from GraphRAG, you can also merge it with a symbolic knowledge graph you have from your database. People call that semantic layer. So, now you can ask a question that retrieves information from both your structured and previously unstructured data. That’s a big thing.
And of course, in order to do that you need to do some kind of entity resolution, entity linking between if you have a product called, I don’t know, iPhone 16 or latest iPhone, it’s called in your document and then it has a different name. It could be Apple iPhone on your database and you want to make sure it’s the same thing. You want to do some entity linking over there, so you can have one on this graph and ask your questions. And that’s where we see a lot of value from GenAI. The fact that you can now… So, I’m sorry, I’m just talking about that, but I think it’s very important. Everybody talks about AGI. And there’s this race of who’s going to have the best math reasoner and score better on math benchmark. That’s fine. And everybody’s watching Formula 1 to see if Ferrari is going to beat the other companies, Honda or the others.
But in the end, you don’t buy a Ferrari. You need a car to be safe, and accurate, and reliable and stuff like that. What Ferrari is doing or what a Honda or Porsche, or Volkswagen, or whatever, I’m not familiar with F1, are doing over there does impact the quality of your cars. But in the end, you’re getting not the latest and greatest, something that’s reliable and is working. So, as you’re building the graph, you realize that before in order to do entity linking or entity resolution, it was something, it was a pipeline, very tedious, difficult to maintain. It required the team to monitor that. And now, I can use a language model to do that with a vector database very quickly and everybody can do that. It’s on your fingertips.
So, in my opinion, this is one of the first returns that we see in GenAI in the enterprise, because it’s now a process. It’s now something that anybody can do very quickly at any level. Exactly the same thing we saw with classifiers or predictive models. Back in 2000, they were companies that were building predictive models, but you required a team of data scientists and software engineers to code it. And fast-forward in 2010, 2015, there’s XGBoost and Scikit-Learn and deploying a classifier is like one push of a button. There’s automatic feature engineering. It becomes much simpler and now you can have predictive models everywhere in your organization. That’s what we see right now, that knowledge graphs are going to be everywhere in your company.
Srini Penchikala: So, for companies who don’t have already established knowledge graph or graph database infrastructure, will they need to take care of that first before they can start leveraging the GraphRAG solutions?
Nikolaos Vasiloglo: No. Microsoft has open source graphics, so if you want to go and use it, just like for question answering, you can download it. I think the offer and implementation, as we offer and deployment in Snowflake, if you want, that’s fine. There’s different Neo4j offers. A lot of companies offer their own deployments of GraphRAG. So, if you just want it for question answering, you can just download it and deploy it. However, if you want to invest in the knowledge graph construction and use of the knowledge graph for other applications, then it’s a little bit of more work. There are not that many solutions out of the box that are being built right now. This is the project that I’m working on right now, some kind of automated knowledge graph for the structured and for the unstructured data. Even your database needs to come into this graph form so that semantically it can be more useful.
GraphRAG Pipeline Components [17:49]
Srini Penchikala: More useful, right, yep. In terms of implementation, what are the components that a typical GraphRAG pipeline has? Obviously, you mentioned a vector database is a prerequisite and then a graph database. And what are the other components that we need to implement this application?
Nikolaos Vasiloglo: Yes. So, you definitely need a graph database and you need the knowledge graph construction. So, the knowledge graph construction, it’s a sequence of let’s say LLM interaction. Like you give instructions, so we take the documents. First of all, it’s the parsing. The parsing is very, very important. Yes, if you have plain text, that’s very easy, your PDFs. I think most of the information is stored on PDFs. Now, documents are rich. They can have graphs, they can have tables, they can have multi-column stuff. And there’s a bit of a trade-off, because you can use some Python, PDF libraries to do the parsing, but very soon you’re going to hit irregular pages, which you need to ship them to visual parsing. OpenAI and Anthropic, they have some pretty powerful but also expensive APIs and others have, too. So, you have to be careful, because if you just try to parse visually everything, first of all, visual parsing has…
If it’s like pure text, it’s better to do it symbolically because you’re not going to have any errors. Visual parsing can always have errors. So, building that component requires a bit of engineering, but it’s very important. If you get that wrong, then you produce garbage, then that’s it. You’re going to get garbage. Then the knowledge graph construction, you have to split. So, there’s the splitter where you have to split your documents into sections or chunks, or pages. That also affects how you split it. And then when you’re doing the knowledge graph construction, one of the difficult thing is if you’re lucky and your document is small and it fits in the context of the language model, you can upload all of it and start prompting, giving instruction, getting output, testing the output, going very fine, refining it. It can be quite lengthy. So, that’s another note to tune, because you also don’t want to be very expensive.
But if your document is long, the SEC documents can be 60 pages, you want to pass the sections. But you also need to keep some context along the way, some metadata, so that when you’re just reading a page, you know what you’re talking about. In theory, RAG and GraphRAG, like creating a demo is easy. But if you want to push them into production, there’s all these details. And then after you build the knowledge graph, you have to do some type of community detection or creating paths in order to start revising the information, and combining information from different paths, and creating new pieces of knowledge that you later on are going to vectorize. And in some cases, we’re also using a different type of knowledge graph. So, there’s something I forgot to mention.
So, there is a knowledge where you have entities and relations and properties, something we call the Wikidata format. There’s usually a central statement and then some qualifiers. Nik works at RelationalAI and it could be started at 2018 or ended at, or location. The central statement is you work at RelationalAI. The other kind of qualifiers that clarify. So, you distinguish between being a VP of a research ML versus being VP of engineering before. But there’s another graph that we found useful. It’s called the entailment graph, where it says this implies that. The entailment is like a logical declaration, which we’ve also realized can be useful in some cases when you provide entailments. Especially when you’re working with documents, that they have customer support and customer data, it says the button color is red, implies restart, and that’s a different way of expressing logic. So, pretty much roughly these are the components of GraphRAG, more details on that.
Tools and Open Source Frameworks [21:52]
Srini Penchikala: And to get started with this, for the listeners who want to get started this on their local environment and not production, obviously, what are some of the tools and open source frameworks you recommend?
Nikolaos Vasiloglo: I think in terms of open source, as I said, Microsoft has an open source package. Old platforms like LlamaIndex and LangChain, they do provide some kind of support. They provide the components. Then there’s a lot of open source or free databases, like Weaviate and ChromaDB and others. Of course, there’s three tiers from commercial ones, like Pinecone. And you can find a lot of re-rankers on Hugging Face. Or if you want managed solutions, companies like Together AI, they operate the components and SambaNova and others, of course. Or if you want, you can always open ChatGPT, and take all your documents and throw them in there and start asking questions. That’s also not so pretty good.
Srini Penchikala: Questions can include and what tools to use, right?
Nikolaos Vasiloglo: Yes.
Responsible Graph RAG [22:49]
Srini Penchikala: With all these, especially AI applications and solutions, the accuracy and the consistency is very critical. So, can you discuss how we can ensure the accuracy of output from GraphRAG systems, so there are minimum hallucinations and biases?
Nikolaos Vasiloglo: First of all, we have forgotten something that there’s no enterprise system without a human in the loop. So, definitely, the knowledge graph construction is the place where you can invest a lot in terms of getting accurate facts. And there’s several iterations you have to run and there’s a different thing. You build it with OpenAI and you have an Anthropic model verifying and say, well, you found that fact, but can you actually verify if I give you that fact and you find it on the page? And can you show me that it is? You go through different… And in the end, you have a human that can review some of the facts in a group manner and say, okay, all of them looked okay, because they are in the same category and it got that right, or here it got it wrong and it needs to be revised.
And usually, the source of hallucination or the problem comes from the fact that your instructions are not accurate or there are exceptions that the only instructions cannot get. And the language model goes and it does whatever it thinks is the best. Pretty much you’re going to have similar problems. Even if you didn’t have a language model and you were asking humans to go and extract facts, you would have faced the same problems. This is another thing that some people get confused. They count as errors, things that they’re coming from the uncertainty of the human language or ambiguity of the human language. Remember, we’re having one problem here. The human language is much more expressive than a knowledge graph. Knowledge graph, like representing things in triples with maybe some score and maybe some rules, does not have the same expressive power as human language.
And the human language can also express statements that they have, ambiguity and uncertainty. So, things can be lost in translation. Now, unfortunately, we don’t have a way to say that. There are systems like MiniCheck and checkers that you put the context and you put the answer and say, well, is the answer supported by the context? So, that’s typically the part where you get the hallucination. My experience is that if instead of returning text you return facts statements, empirically, I can prove that it tends to hallucinate much less. You make it simpler for the language model, that sort of thing. And of course, the other option is to use the question answering system as a decision support and not as a truth.
When I mentioned before that you can go to a language model, a question answering system and say, what do you think are the implication of this we’re talking about right, potential war between India, Pakistan and my supply chain or something like that? It gives you something. But in the end, your decision or say invent a new medicine, you’re not going to take the medicine, put it in the market. You’ll have to check it. So, staying in brainstorming and imagining or thinking is probably the safest way, where basically the hallucination is a feature, not a bug.
Srini Penchikala: It’s trying to tell you something about your own data, right?
Nikolaos Vasiloglo: Yes. It was trying to tell you something that never thought that sounds counterintuitive or sounds wrong, but maybe it’s not.
Security and Privacy [26:03]
Srini Penchikala: Continuing on the ethics side of the discussion. So, what type of security or privacy, or governance measures should be implemented to protect sensitive data within knowledge graphs and GraphRAG systems?
Nikolaos Vasiloglo: Yes, that’s a very interesting question and I’m glad you’re asking that. I’m working on databases. So, the first class citizen, I always assume that the data come from a relational database. And as you know, there is a big investment over the years to create role models, and permissions, and structure. Because when you’re having a database, you don’t want people to have access to data they’re not supposed to have, and we somehow have forgotten about that.
Companies right now, they take all their data, they throw them from Slack, from mail, from Zoom transcripts, they put them in there. And there’s always the risk that people might see things that they’re not supposed to see. Now, we don’t have that problem, because everything comes when it’s being retrieved. Or if you want to avoid, not just anyone can do that. If they’re coming from a database, that’s already covered by the governance you said on the data, but who can see what?
So, when you are retrieving facts from a database, relational database, then it’s just going to return what you’re allowed to get. So, which is basically passing the problem, which is okay, I’m now getting data from Slack, from, I don’t know, Confluence, from all these different sources. There is some work that needs to be done, that if you want, on transforming them from that place, moving them to the right place with the right permissions. I don’t know of any anecdotes, but I’m sure that people might have come into unpleasant surprises where they realized that things that they weren’t supposed to be visible to some people where. I mean, let’s think about how many times do you hear about a company accidentally making an S3 bucket public that contains passwords?
Imagine. And that’s like people with IT, culture and still make mistakes like that. You can imagine how much more difficult it can be with RAG systems. So, a lot of attention needs to be done. And of course, the other thing is the safety and alignment, which means that you put them over there using a language model. Now, the language model might start saying things that it’s not supposed to do. So, you have to make sure that people are not abusing it. There’s like a whole class of companies working on that, providing some kind of safety.
Srini Penchikala: Right. Yes. What they’re saying is appropriate and all that, right? So, yes.
Nikolaos Vasiloglo: Yes.
GraphRAG Limitations [28:31]
Srini Penchikala: Okay. So, what are some limitations of GraphRAG? What kind of applications are not good candidates for using this?
Nikolaos Vasiloglo: As I mentioned, there’s a whole spectrum of questions that you can ask. And the more difficult questions you’re asking, the more resources you’re needing. And especially, if you’re asking how many slots should I buy for these facilities or how should I schedule my production to reduce supply chain costs by 10%, it can provide you an answer, like verifying that answer and explaining that answer is not trivial. It somehow becomes also a user experience problem. Let’s say you’re an expert. You go to a specialized consultant to solve the problem. In the end they come up with, I don’t know, 100 pages report, and PowerPoint, and several sessions to explain this to you. I don’t think that’s something to avoid, but I don’t really know how a RAG system can do that in a way that it’s convincing. I think there’s also lack of trust, especially now that we’re moving.
With RAG, it’s a decision support system. It’s not a decision execution system. It means you’re giving you an answer, but the final decision is yours. You can ignore it, you can question it, you can do whatever you want or you can accept it. But right now, just a simple task, saying, go and book me a restaurant with a view to the sea with no more than $500 and blah, blah, blah. It has to actually go and put your credit card and has to deal with scammers, with fake reviews and all that stuff. I mean you probably trust your friends, I don’t know how much. So, we’ll need some time to build trust. And again, the other thing is if you need… In the enterprise, quite often you need deterministic behavior, but LLM systems are not deterministic. So, they’re probabilistic. They can get you to 99, 99.9, but that might not be acceptable for your system.
So, trying to solve a problem that requires determinism, like you’re not going to fly an airplane with a language model, won’t happen for now. It’s an issue. And that’s why it is recommended to use a RAG or GraphRAG systems in places where you can have a verifier. That’s why it has been so successful in mathematics, because it provides a proof for a theorem, but then there’s the theorem proof is that they can execute and say that this is a correct proof. You didn’t skip, you don’t have a logical leap over here. So, yes, that’s one of the limitations.
AI Agents Role [30:58]
Srini Penchikala: So, you mentioned about decision support versus decision execution systems. So, the recent trend in Generative AI space is AI agents. So, can you talk a little bit about AI agents and GraphRAG? What is the best combination for using those?
Nikolaos Vasiloglo: There’s been a paper from review papers, I think it was from Google, like agentic RAG, where the idea is you ask the question and then the idea over here is decompose that into simpler tasks. And you have a different agent, and each agent goes and completes the task, and they come back. The problem that I mentioned, where was the closest planetarium to Atlanta in 1953 or ’58, or something like that, that’s a task that requires agents to go and do that. Also, keep in mind that a competitor to GraphRAG and RAG are these reasoning models, like the DeepSeek R1, models where this somehow have the graph in their memory and their reasoning. They’re moving from one place to the other. They backtrack and hop to another reasoning node. Of course, you can always create a reasoning model that instead of using… I think that’s something that I haven’t seen.
It would be interesting to create a reasoning model that it’s not using its own knowledge, but it’s based strongly and only on a graph that you have provided. People do that implicitly by having the model produce steps and then invoke a graph or a verifier to check if that reasoning path was correct or not. But it would be interesting to use a reasoning model that uses a knowledge graph as the only tool or as the main tool if you want to provide an answer.
Emerging Trends [32:39]
Srini Penchikala: So, what are some emerging trends in this space? What are you seeing? Any innovative use cases in the GraphRAG area?
Nikolaos Vasiloglo: As I said, I see more power or more effort to build the knowledge graph. The same way, as I said, that you go to Google and it gives you an info box when you ask a question and an answer, I think now every company wants to do that, have the same kind of experience, as I called the info box. And the other thing is this modernization of application building based on the knowledge graph. This is a shift that we’re seeing, where before people will build an application and they will have an application database. Now, the modern apps, they don’t have a database that backs them. They have a knowledge graph that backs them.
And because they have a knowledge graph that backs them, they can also utilize language models. So, enterprise is all about application. That’s where the money is, that’s where the value is. So, we start seeing the rise of language model backed modern application with the rise of knowledge graphs. They go together. They enable GenAI applications, enterprise applications. They are enabled by the use of knowledge graphs.
Online Resources [33:57]
Srini Penchikala: So, any online resources or articles you can recommend for our listeners to get more information on these topics.
Nikolaos Vasiloglo: First of all, InfoQ publishes a lot of stuff. If you want to go a deeper level, I tend to get my information directly from conferences and workshops. This is kind of like the latest and greatest. I mean, it really depends on the level that you want to work. So, first of all, LlamaCon was a few weeks ago. That’s a great place to go, and find use cases, and see what’s going on and how people are using them. So, I think the best way right now is to go to companies that have user conferences and see what they publish. LlamaIndex publishes a lot, LangChain, all these companies, like open source projects, they publish a lot. I write a lot. We are on Snowflake. I published them on the Snowflake Solution center, but you don’t have to go there. As I said, my favorite three venues to get the latest and greatest is NeurIPS, ICML and ICCAD, these are the top conferences in AI.
It might be difficult for some listeners. But if you go on my LinkedIn, I spend a lot of time distilling the information and making it even simpler, like creating the trends and more accessible. And then on top of that, I use other language models like NotebookLM, which is fantastic. I don’t know if you’re using. I use it a lot. You can take difficult articles and it creates amazing summaries. In fact, I just realized right now that it even generates a knowledge graph of the document. It’s called mind maps. And you will see knowledge graphs there as well. It’s a new feature. I recently released that. So, yes, there’s a ton of resources. There’s not a specific source. I guess, if there is a place, that would be Andrej Karpathy. He is very famous about his video blogs and write-ups in the GenAI space in general. So, that’s someone for people to follow.
Srini Penchikala: Thank you very much for joining this podcast. Thanks, Nik, for your time. It’s been great to discuss one of the very important topics in the AI space, the knowledge graphs and graph databases. They bring a more powerful dimension to the data analytics and AI solutions. So, this has been a good thing. So, for our listeners, thank you for listening to this podcast. If you would like to learn more about AIML topics, check out the AI/ML and data engineering community page on infoq.com website. I encourage you to listen to the recent podcasts.
As Nik mentioned, we have a lot of articles on different topics. And also, the trend reports that we publish on different areas like AI/ML, architecture, culture and methods. So, they’re always very valuable resources. And we publish those for practitioners like you to share what’s happening in the software development space. And so, thank you, Nik. Thanks for your time. Have a good one. Thank you.
Nikolaos Vasiloglo: Thanks very much for hosting.
Mentioned: Andrej Karpathy
.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.