From Symptom Checkers To Smart Chatbots: The Role Of AI In Virtual Care

Transcript

Andre Riberio: I want to give you a walk through on Healthily stuff, from symptom checkers to smart chatbots. Let’s start with a story. It was 2 a.m. A 7-year-old boy woke up with a headache and with fever. It doesn’t feel like it’s an A&E situation, but the parents are still worried. They go and search on Google. Then they start spiraling. They believe it’s meningitis, which is one of the top results when you find it on Google. In 10 minutes, they head to A&E to try to get that checked. Just to soon realize it was a sinus infection that could have been self-treated at home.

They lost time, they stressed, and they lost confidence in their own judgment. This is just one family. Let’s take just a second case. A 68-year-old man calls his daughter and stumbles over his own words. He’s slightly confused, but suddenly he snaps back to normal. He tells his daughter that he’s fine and probably it was just stress and nothing really to worry about. The daughter searches online, couldn’t find any particular reason, and so they both waited until morning. What the father had was a transient ischemic attack or TIA or microstroke. This basically means it’s a warning sign that a full stroke can happen in the next 48 hours, so immediate action needs to be taken to avoid a permanent situation or even death.

These are two sides of the same coin. On one hand, we have a self-care scenario that led to an A&E visit, which raises NHS costs, leads to stress and to a loss of confidence. On the other one, we had an emergency scenario that was missed by both the father and the daughter that could potentially lead to long-term effects. The question on both of these is always the same, what should I do? What should I do now in my specific case? This is not really a niche problem. It’s not just these two cases. In just the UK alone, we have around 50 million health-related searches, and it’s increasing across all age groups.

More and more people are searching online for health-related questions. What is the real issue here? Is it about finding the information, or is it about how to act? What we’re really trying to solve is how to take people from outcomes, from what potentially they can have to act. Let me take you through the whole journey from a user perspective. The user started with symptoms. Either it’s the user reported symptoms or observed symptoms in a different person. That could be a headache, could be fever, could be slurred speech.

Then they find out what potentially it could be. They may have some ideas, whether it’s meningitis, whether it’s a sinus infection, whether it’s just stress. They still don’t know what they should be doing. They know what potentially could be, there’s multiple variations. There’s a lot of information, but they don’t exactly know how to act. That’s where we are. What does this mean for me right now, for my particular scenario? That’s what I want to know. That’s what I want to be able to do. What they really want is the ability to act with confidence. Let’s take just this last step from insights to actions. The user question was, what should I do? This is not just based on what they need, which is obviously very important, but it’s equally based on what access to health they have available and what they’re eligible for. For example, if you are in the UK or if you are in the U.S., that would become different things. We need to take all of this into account. This is one of the reasons why we are partnering now with health insurance companies to create this missing link between what they have and what they can potentially do.

Background

I’m Andre Riberio. I have over a decade of experience in AI and ML systems. I’ve been working in healthcare, but also other areas such as art analytics. I studied biomedical engineering and then pursued a PhD in clinical medicine research at Imperial College. I’m currently working as Chief Technology Officer at Healthily, and overviewing the systems which I’m going to show you.

The Ideal Scenario

We talked about the problem. Let’s talk a little bit more about the solution now. What is the ideal scenario for this particular case, for this daughter? The daughter, what she truly wants is a system which advises her accurately and reduces the human error. She wants something which takes from the stimulus to the action the least amount of time, so it improves the response time. She wants to minimize future risks, which in this case is a full-blown stroke. She wants reassurance that the action she’s taking is the right action. How do we do it at Healthily?

At Healthily, we started by creating a smart symptom checker. This smart symptom checker is based on a ground-up tool that was developed by clinicians, by medical doctors. It’s based on a medical database that we built in-house. It’s clinically validated, and it’s an AI tool. Now you ask, so AI tools, how can you make sure that AI tools are accurate? I will explain that in a bit, but basically it goes around what is called a Bayesian model, which is very rooted in math and in statistics. Finally, like I was explaining as well, we partnered with healthcare insurers to create that missing link.

System Architecture

Let me give you a little bit of an overview of the system architecture, so you get a pretty good idea of what is going on behind the scenes. As every other app, we start with a user interface, a web app, an API, or something else. In this case, the user starts with the natural language processing. They can start with, I feel sick, I have fever and stomach pain. That system gets past what we call our natural language processing and chat engine. This engine routes to three different other systems, the clinical reasoning engine, our semantic retrieval engine, and our flow-driven clinical logic. The first one is specifically the one that handles the base logic, the one that handles most of the cases of the assessment.

The second one is when the user is looking for information, which I will explain in a bit, and it works with BM25 and dense encoding. Basically, it just takes care of all of the semantic relationship between the user and our database. The final one is the cases which cannot be handled by any symptom checker. These are dangerous cases which the user may be falling into, such as suicidal thoughts. All of these get combined into a personal care guidance. Basically, we align with the healthcare insurance to provide them the service which they need, which is specifically for their need. Instead of telling you, go to see a doctor, I’m going to tell you go to your MSK pathway, go to see your 24-hour nurse, and so on. That’s really what they care about. What they care about is what action should they be taking given the symptoms, given the issues they have right now.

The NLP and Chat Engine

Let’s look in depth on the systems themselves. Let’s look into the NLP and chat engine. This is built from three main microservices. There’s a little bit more to it, but that’s about it in a nutshell. The first model is an intent detection model. It’s a model that detects whether a user wants information, wants an assessment, or is it not really clear about what they want. Then we have what we call a workaround classifier. That is a model that aligns with the topics, dangerous topics. It identifies whether the user is saying something as, I’m having suicidal feelings, and you cannot take a symptom checker normally, but you’ll have to go on a very specific flow. It also does symptom entity extraction. Given the raw query that the user had, I have a headache and fever, it will detect headache and fever in this particular case as the potential symptoms which we will be moving to the other services.

Just finally, we take this system to detect whether an assessment is possible, and this will be used to inform what step we are going to take next. I’ll go a little bit more deep on how they were trained, because the whole point which I’m trying to give you here is that this is a health product. This is a class I medical device at the moment. We are very much into the safety and control of each system. The reason why they are microservices and they work in independent ways is such that we can validate them properly. The intent detection model particularly was trained on manually labeled data from the year before. On the year before, we went over that dataset and we decided whether it was information or assessment or neither.

Then we further provided on the interface a dynamic button which allowed the user to also say whether they wanted information or assessment, and we use both of that information to fine-tune our own transformer model, which allows us to create this intent detection. The workaround classifier, on the other hand, was trained using a Siamese Neural Network. This network basically compares the embedding from the user query and seed queries, which are about the topics, which can be a mental health situation such as suicidal, it can be a burn, or it can be different scenarios. Given those two, then we can create the cosine similarity between them, which basically just means the similarity between these embeddings, and therefore we can assign a threshold.

Our clinical team balanced the precision and accuracy of each one of these topics and defined what the threshold should be. What I want also to make clear is it’s not just about building the models, it’s about the governance of such models, the regulation. We have a whole process from making sure we validate them, review them, and so on, from the beginning to the end, and the models need to be constantly revised to make sure they are accurate.

Then the final one of these UQML models is the symptom entity extraction. In fact, this is what we call the Mediterm model, which is slightly different, but it’s still within the same realm. This model is based on the Stanford CoreNLP, and it’s enhanced by our own dataset. We increase this with our medical concepts and synonyms, which are proprietary. There are two main points to this system. The first one is a label-based concept activation. What you start with, which would be in this case head pain, will activate two different concepts, which is head, which is a concept, and pain, which is a different concept.

Both of these get combined into a single concept, which is headache in that particular scenario. The system is able not just to pick a single word or multi-words, but to combine them into a higher and higher level of symptoms. The second point is hierarchical symptom propagation. We started from a pain in the bottom of the foot, which was identified as a concept, but we can also activate its parent, which is pain in foot. Why do we do this? Because logically, if you have pain in the bottom of the foot, you therefore have pain in the foot, and that increases the reliability and coverage of our symptom checker, which we will be explaining.

The Clinical Reasoning Engine

We explained the NLP part. I’m not going to explain all of this, I’m just going to explain the top two. The NLP is the engine that routes into these three systems, and the engine which provides the symptoms which the model, the clinical reasoning engine will use. What is this engine? This engine again, like I explained, takes Bayesian inference, takes safety rules, and takes red flags, combines all of these and then eventually provides you what are the next set of questions that we should ask the user. Taking that structured symptom input, which was provided by our NLP model, then we give all of that to our Bayesian inference, safety rules, and red flags. What we truly start with is the red flags. Let’s take this case, the user reported chest pain. That’s a red flag symptom, because this is associated with many conditions that are red flags, and they are dangerous. Given that, then the system directly asks recent injury, and the reason for that is because there are multiple conditions which are associated with chest pain and recent injury.

If we can rule out recent injury, therefore we can rule out that particularly dangerous condition. That’s the first thing we do. We try to focus very much on safe triage. Then it becomes the big model, which is the Bayesian inference part. These are two simplified formulas, but they basically give you everything you need to know. The way it works is we try to maximize the information we can get over the next round of questions. The user told us that they have chest pain. They told us they have confusion. Now we need to ask them what is the best set of symptoms we should ask them that will reduce the uncertainty of the system. We do that through Bayesian inference to calculate the probability of a condition given a set of symptoms, and we do also the distribution of those probabilities. We calculate the entropy and we minimize that entropy, and therefore we take out what is the top set of symptoms to ask given entropy minimization.

Given that, then we can ask multiple rounds of questions. We keep improving the prior of the system. We keep improving the knowledge that the system has and how confident it will be about what is the probable condition. We can then therefore rank the list of possible conditions at any point and also associated triage level. Given that we can do that, then we can obviously compute the posterior of each condition. That’s the same formula as before. The only thing we are saying here is that we have a threshold identified there. Over a cross of multiple questions, then we become more and more certain that it can be a potential condition. The medical team also defined what this threshold should be, which was 90% over across at least 12 rounds, and that decides that the system is confident enough to finish and provide the outcome. The conditions that we will later be showing will be dependent on how relevant they are to the user, not just based on this particular threshold.

We explain everything from the symptoms to the outcomes, what they may have. This is not really what I want, since I started in the beginning of this conference. It’s more about the recommended triage, the actionable next steps. What should I do? That’s where the partnership with health insurance is really important. That provides us that exact link, what they can do. This is just a simple example where the user started with back pain. It was an MSK scenario that led us to recommend physiotherapy pathway, which the insurer directly have a way to do a direct referral. It bypasses directly speaking with the doctor and goes directly where they need to be. Or anxiety, which led to mental health. Maybe there can be self-care, or maybe the system will decide whether a therapist may be needed. Again, the insurer may cover a counseling app. Chest pain, urgent. We detected that that may be an A&E scenario, so call directly 999. Or if the insurer has an option, then inform the insurer as well. Or the very common case of a routine GP, which is a typical scenario that you will see in every symptom checker. We can also tell you go do a video GP or an in-person, depending on that, which again, you can just book directly with the insurer.

Demo – Healthily’s Smart Symptom Checker

This is just a very quick example, which we are trialing with a health provider. The user is being shown a lot of navigation tools regarding insurance related information. They decided to go into the symptom checker. They are adding their age and gender. They are going through and saying they don’t have any rare medical condition, which does not allow them to use the symptom checker safely. They say they have slurred speech, which is what they started with. The system decides and asks them if they do have slurred speech. This is important. Basically, the point I’m trying to make there is that we always want to be sure. The ratification point is a very important point for us because it makes sure that what we’re detecting is actually correct. They decide to continue with the assessment. They said they had this for a few hours only. They have high blood pressure. They are not overweight. They are not a smoker. They have some slight confusion, and it was a new confusion. They did not have an injury, but they have some difficulty in language or speaking.

Then the system keeps asking for a different set of questions, which again, like I explained, is related with triage, is related with red flags, and is related with the possible conditions that you have. It’s going to do this for a few rounds, which hopefully is not too much. You get a pretty good feeling of what is going on. Also, remember that for the probabilistic model we are doing, it’s not just about what you said yes to, but what you said no to is equally important because it will exclude conditions or make them less probable. When the system is confident enough, which takes a few set of questions, then it will provide you a report.

In this case, it recommended 999 because it’s an ambulance scenario, and we’ll understand exactly why. Those were ruled out situations. These are the most likely, which we have TIA directly there, and also ischemic disorder, which is stroke. We also allow the user to select other options. We do recommend them these, but we tell them if you do want to see your doctor, you can obviously do so, or call 111. We also give them information about what they type, so they can confirm that what they have is correct. We explain all of this. We explain from symptoms to conditions to actionable results.

The Evolution to Conversational Chatbots

What am I going to show you next? There is still a missing thing, which we didn’t cover, which is, sometimes the user starts with just a question. They don’t really have yet the symptom, or they may, but they may not really know where they want to go. They start with a simple health question. How can we address this? How can we make sure that we can also target these people? That’s where we started looking into conversational chatbots, or moving from symptom checkers to fully conversational chatbots. What do we do? I’m not going to go over in too much detail here. This is a completely new product we are building, but I will give you exactly how it works. It’s based on a RAG, Retrieval-Augmented Generation model. I’m just going to explain how this is applicable in the healthcare sector.

First of all, we start with a question aware transformer. This is simply saying that if the user started asking about what is COVID, and then asked about what are the symptoms, we can generate a question which is, what are the symptoms of COVID? It combines the previous context into a question aware transformer question. Why is this relevant? Because then we use that question to search on our health database, which is one of the largest in the UK. We use a retriever model to chunk our data, to encode that data, and then we search that against the user query. To make this even better, then we use what is called the reader model. It goes over these chunks and extracts the snippets which best matches the query, and re-rank them as well. All of these models, these two and the previous model, are fine-tuned using our own proprietary data. We did this to make sure that this is as good as possible for our scenarios. They are also done in specific microservices, again, such that we can control each one of them. We can know where each one of them fails.

Then, finally, as everyone does, we go into the LLM world of answer generation. Here we do use a commercially available model. We are moving towards our own fine-tuned model as well, the same way that we have been doing for before. Things are evolving really fast. Language models are getting really good, and we want to be able to provide the best possible solutions as well. We are currently using what is out there as the best possible models. We’re not just doing that. We’re not, just take an LLM and go with it. In fact, we are working towards building a lot of guardrails to make sure that the systems we have are safe.

Given our RAG model, we’re looking into restrictive knowledge, so making sure that the model only answers to the questions which are health-related and based on our data. Factual checking, making sure that the answer the LLM provides is truly based on the content that it was given. Uncertainty estimation, is it truly hallucinating or is the answer correct? Is it safe? This is probably the biggest one, how safe it is. If the user asks, can I take an antihistamine for my allergy? You could just say yes, totally fine. It is incorrect because if you are pregnant, you cannot take certain anti-allergy medications. That would be directly an unsafe answer by not having context. That’s what we are working on right now. This is obviously a lot of work, but we’re building towards a safe chatbot.

Demo – Healthily’s Chatbot

I’m going to show you one which is very quick. This is not a product as of now. We are trialing this out at the moment, but most of this is going to change. I’m just going to show you exactly how this links with what we were saying before. In this case, the user just asked a question about headache and fever. The system does provide information about it and also provides, in this case, about just that particular URL, which is our own content.

Then they went over and like, ok, in fact, this feels like it’s what my daughter has, which was our son’s example, not daughter. They believe it’s meningitis. From this point onwards, then the model already picked up the UQML models or our intent detection model and drives them directly to a symptom checker. What you saw there by the end is exactly as safe as the symptom checker we had before. The questions it’s going to ask is still based on the Bayesian network. It’s still based on the safety. It’s just shown in a completely different scenario, which is a conversational chatbot.

Future Work – Symptom Checker and Chatbot

Where do we want to take from here? There’s a lot to do. For example, on the symptom checker, we are pursuing class II medical device right now. This is to ensure reliability and clinical trust on our systems. We are already a medical device class I. On the chatbot side, we’re ensuring alignment with regulated guidance and building all of these guardrails, which will allow us to say that something is safe to a certain level. We keep doing model improvements, both on the symptom checker as well on the chatbot. Regarding our architecture, we’re improving the data. We’re improving how the symptom checker uses that data. We have various different projects to do that. While on the chatbot, we’re trying to make the model retrievals to be better, to be more aware of what it has said before, trying to incorporate also the outcome, the report that you can get from the symptom checker directly into the chatbot, being able for a user to ask questions about that.

Finally, transparency. We have a transparent document online, which you can look into which should tell you everything that we do, and we are always on that. We attempt to do that every single time. We’re trying to improve the rationale of the symptom checker, so why are we asking you that particular symptom. We are, on the chatbot side, trying to improve the hallucination, improve the safety, improve the source attribution.

Summary

On Healthily, we’re trying to make a system which is end-to-end. It starts from the user question, from their symptoms, all the way into the actions, what they can truly do with that information. We’re trying to be conversational as much as possible, as natural as we can potentially be. That’s where the chatbot came in. We try to make it as safe and as reliable as you can potentially be, which is the intelligence reasoning using the probabilistic model and safety-aware triage. We are integrating with healthcare providers, insurances, to be able that our systems do give them services which are applicable and real to them. We’re trying to be as clinically aligned as we can potentially be by creating guardrails, by improving validation and auditability on our systems. At the end of the day, we’re building this for people. We are trying to honestly help users have more confidence and act when they should act.

Questions and Answers

Dr. Kreindler: This is one of the key areas where the large language models are playing an important role. At what point do you feel there will be an insurable class II product that you can genuinely have a proper chat with rather than go through clickable links?

Andre Riberio: We’re literally looking into that. We’re obviously starting with a class II medical device on our symptom checker. That’s the first part of it. We’re building step by step on that. The conversational UI, which I was showing at the end, provides the user a way to start getting used to that conversational format. It’s easier to go to regulation and say, this can still be a class II medical device because it’s still the same questions in the exact same way, just in a different format.

Then it’s proving that you can go from that point into proper natural language. I could already tell you we could. We could, for example, use natural language. We could use our NLP model to keep asking them, is this what you said? Is this what you said? Probably not a great user experience, but it would work because if they keep saying yes or no to what we identify, it would be equally safe because they are ratifying our questions. What we want to go is a point where we can be certain to X amount, where that X amount, I don’t know what that number is, which we are ok and safe. That will be the point. We are working through a class II medical device on the symptom checker. We’re building these chatbots now with insurances as well to validate whether people want them, whether they engage with them. Then going from there, yes, we’re pushing and pushing the boundaries and trying to get into class II.

Dr. Kreindler: Have you done any research yet on the usability difference between having a chat versus clicking buttons?

Andre Riberio: We’ve done recently a study, it has been exactly showing the chatbot and trying to understand what is the usability, whether people would like that. For example, the integration at the end of the assessment, whether they would prefer that versus the more classical approach, which is the other one, the form-based approach. People do seem to prefer the chat nature because it’s incorporated into the questions and it just feels a more normal way to interact with the system.

Dr. Kreindler: Not official party line, but in terms of your intuition, how much do you think the multimodal assessment of us as clinicians, in terms of intonation of voice, visual appearance of things, just that kind of, you might be short of breath, but you’ve got a bit of a wheeze in your voice and you otherwise look nice and bright and shiny and got red cheeks. It’s probably not the worst kind of shortness of breath. How much do you think multimodal is destined to become part of systems like yours?

Andre Riberio: Yes, for sure. It has to move that way at some point, because at this point, yes, if the user tells you, for example, a skin concern, you can ask questions about it. You can ask, is it red? Is it uniform? You can try to address it, but you are still very much focused on what the user knows and how he can reply. The problem with that is that the systems are not accurate yet to take a picture and say with confidence what this should be. That’s why that realm is still hard, but it will eventually go there for sure. Again, we can start thinking about situations where you could try to understand directly from the picture and then again get it ratified by the user, trying to understand whether what you’re seeing is what seems to be matching. There is still a gap between where we are to those scenarios, but for sure that will be a case.

Dr. Kreindler: I suppose at very least you can collect that and present a referral note with more data than just text-based things. You could even do that today.

Andre Riberio: Yes.

Participant 1: You mentioned about the ratification. I was just wondering if you had any further data quality challenges and what you’ve done about them.

Andre Riberio: The ratification?

Participant 1: Yes, you mentioned that one. Do you have any others?

Andre Riberio: Yes. We have a few control systems. The ratification is one of them, so the one we were talking about. That’s exactly from the point where there is a text question from the user and we convert it to symptoms. We always ask. We thought about at some point to say, if you’re confident 90% of the time it’s fine, we decided not to. We decided it’s always better in this particular scenario, at least now, to always ask, is this what we identified correct? We’re also training the system. It keeps adding to the data. I don’t have the statistics right now on what the numbers look like. I think I had the slide at the end on the future, which we are in fact trying to compare that with current LLMs and see how good truly they are. We do have the dataset right now. We just have to work through it. We do have another control point, which is called clarification instead of ratification. That’s when the user gives you a symptom that can mean multiple things.

If the user asks for a particular symptom that can mean multiple things, we ask them which one of those things do they point to such that a simple ratification wouldn’t work because they would say yes to, and it would mean a different thing. We give them a different set of options which they select and they can pursue. That’s that, those control points. That makes sure that all text that we are doing is correct up to the user knowledge, obviously. All of the questions that we ask after, they are not generated in any way on the symptom checker, they are very specific from our dataset. Those are fully controlled. At the end, we further give the user the report on exactly what they said, so they can always validate that something was missed, which it shouldn’t. Just saying that if something was missed, then the user is aware of that, or the doctor, because we can provide those reports to the doctor as well, would be aware. Those are the control points.

Participant 2: How does it deal with multiple symptoms? Like I’ve got a pain in the foot and a sore head.

Andre Riberio: Yes. No, it can definitely detect multiple symptoms. It can detect as many symptoms as there are there. It works with single word and multiple word. Sometimes it does mix them up if they are too close together and they would refer to different concepts. It’s not that common. What we usually have behind the scenes is a dataset which we manually created with the primary concept and its associated synonyms. For example, for headache, we have things such as cranial pain, head pain. Each one of these are also their own concepts. Cranial is a synonym of head, which is a synonym of something. Any combination the user selects will still lead to headache. We also use that on our retrieval as well. Any combination will still work. No, it does work for multiple symptoms within the same query and multiple words per symptom as well.

Participant 2: Would it view them as two separate things?

Andre Riberio: Yes.

Participant 2: If they’re part of the same problem, then it may not get it correct then?

Andre Riberio: The example I had there as I have a headache and fever, they are detected as two different symptoms. What we do is we take them both as to explain a condition. In fact, there is a point there to be made, which is, is the user reporting too many symptoms? Symptoms they may not truly have or maybe they don’t really know about and not leading to an assessment? Because we do take them as necessary for an assessment to be possible. There are multiple things we are working there, in fact. What we have been seeing is that users usually type around 1.5 symptoms per query, initial query.

Typically, the assessments are always possible. There are a few scenarios where they may type 5 to 10 symptoms, which is not. Or they may type so few, which they start with a headache from the beginning to the end of the assessment, which we cannot provide truly an assessment. There we redirect the user to a few fallbacks, such as providing them information and saying, this is not reliable in the sense that you just told us about headaches. It can be many different things. Yes, to your point, we do take them as different symptoms, but we do try to explain all of them together into a single condition.

Participant 3: What would happen if the user can’t complete the workflow? I’m going through a workflow and I lose feelings in my arm and I drop the phone.

Dr. Kreindler: You fall unconscious.

Participant 3: Exactly. If I have a human at the other end of the phone, they could realize, something’s not quite right, the user has fallen off the track. How do you trust your validators? How do you know that your validators are actually a good guiderail to the prognosis, basically?

Andre Riberio: What happens if the user stops in between? At the moment, nothing, truly. We are aligned with health insurance providers. The decision was not to necessarily proceed. We could do, because we do have timers. We do know when they stop the consultation. We do know who they are, not we Healthily, because we do work with anonymous data, but the health provider does. We could potentially give them and tell them that something happened and for them to pursue. At the moment, no, it’s not on the plans to do.

How do I trust the validators? The validators, you mean our clinicians, you mean our models?

Participant 3: Your models that are the immediate response.

Andre Riberio: There’s two pieces to that. There’s the Q&A piece, which is the chatbot, which uses models to validate. Those, yes, we keep improving them. We are using clinical data to better improve those models and hopefully reach a point where, like we said, it can be a class II medical device. On the symptom checker, we have many different tests. We have a set of tests which are Monte Carlo tests. If you know Monte Carlo, basically it means that we generate samples of symptoms, which are based on the condition we’re trying to target. We do a hundred different combinations of those, and we run them through the checker and we make sure to see what the outcome would be. Then the medical team will validate that.

Besides that, we also have manually defined vignettes by doctors. They created 2,200 vignettes, which 460 is core, and those all need to pass. It basically means that for that set of symptoms, you have to have that condition. If it doesn’t work, it means that we changed something in the system and we have to fix it. Or, we prove that it shouldn’t be that, and then they fix the vignette. There are a few sets of tests. There’s manual vignette tests, which run. There’s automated tests, which are the Monte Carlo ones. There’s even manual tests on UI experience, on making sure that the buttons exist and they target the things they should be targeting. We have a whole fedora of tests.

Dr. Kreindler: One of the scariest things as someone who’s triaging someone, the scariest thing when someone’s got some symptom, is to be able to say, it’s ok to go home. There is always a degree of, was that the right thing to do? Always, even when you’ve got other information about them, blood tests and other things. It’s non-trivial to give anyone the all clear on anything.

Participant 4: Do you have an idea of how it compares using your system compared to going to an actual doctor? Would you have any numbers or performance maybe?

Andre Riberio: We do have data from Imperial College some time ago. I can tell you how it performed against other competitors as well. It was the safest tool out there compared with any competitor. Don’t get me wrong. What I mean with this is that all of us, all competitors, including us, we are over-triaging, and also goes to the point of Jack. We want to make sure that we always give the safest option, even if that reduces our accuracy. That’s our primary focus. If I’m telling you that it’s self-care, I’m pretty sure that it should be self-care, and the other way around, even if something which was emergency was in fact just a GP consultation, because we want to make sure that we are correct. We have a lot of red flags to try to make that. We are the safest. Accuracy-wise, we did go slightly lower than some of our competitors. Compared with doctors, it’s also really interesting because if you put three doctors together, they will not agree.

Dr. Kreindler: You’ll get four opinions.

Andre Riberio: Exactly. It’s very hard also to get that number, although we are working towards that now. Our medical team is validating some of the data we sent through one of our health insurance partners. We do have everything from who went and clicked to see their digital GP. We also know exactly what came out of that. We’re trying to look into that now and try to get the full picture on how accurate it is and how better you can get.

See more presentations with transcripts

From Symptom Checkers to Smart Chatbots: The Role of AI in Virtual Care

Transcript

Background

The Ideal Scenario

System Architecture

The NLP and Chat Engine

The Clinical Reasoning Engine

Demo – Healthily’s Smart Symptom Checker

The Evolution to Conversational Chatbots

Demo – Healthily’s Chatbot

Future Work – Symptom Checker and Chatbot

Summary

Questions and Answers

Leave a Reply Cancel reply

Stay Connected

Latest News

Amazon’s latest sale trims £70 from the Apple Watch Series 11

Anthropic: Pentagon put billions of dollars at stake with supply chain risk designation

Alibaba Unveils QwQ-32B, a Compact Reasoning Model Rivaling DeepSeek-R1 · TechNode

‘Happy (and safe) shooting!’: chatbots helped researchers plot deadly attacks

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Transcript

Background

The Ideal Scenario

System Architecture

The NLP and Chat Engine

The Clinical Reasoning Engine

Demo – Healthily’s Smart Symptom Checker

The Evolution to Conversational Chatbots

Demo – Healthily’s Chatbot

Future Work – Symptom Checker and Chatbot

Summary

Questions and Answers

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News