Busting AI Myths And Embracing Realities In Privacy & Security

Transcript

Katharine Jarmul: We’re going to talk about realities and myths when we think about privacy and security in AI and machine learning systems. Who here uses some sort of Anthropic-based assistant? The most recent Anthropic report said, for the first time ever, Anthropic is seeing more automation than augmentation. What does that mean? It means less of, can you make this text better? Less of, can you generate this image for me? Less of, what is X? More of, I want you to do A, B, C, D, go do it and come back to me. This is great. This was the promise of AI systems in a lot of ways that we could have 4-day work weeks and we could have relaxed times, and computers will just do stuff for us. That’s the whole reason why we’re building this.

I don’t know if anybody here works in privacy or security as well. How do you feel about this? What’s the feeling like right now? It’s a little bit like this. Because we’re not quite sure yet, there’s not best practices yet. We have best practices from privacy and security for many decades now, but it’s not yet sure how do we allow things like automation or things like agents or something like this and still provide some semblance of privacy and security. Every privacy and security team, they want enablement, but they also are on the line. Why is this still a problem?

What we’re going to talk about is it’s difficult to decide in privacy and security in machine learning right now or AI systems, what’s real threats and what’s relevant threats. That’s a real difficulty in today’s bubble around AI in a lot of ways. I do a lot of advising, consulting, and trainings at different companies, and a question I always get from privacy and security teams is, who is really an AI expert and do we need them? Obviously, a lot of my work has been in training deep learning models. I have a different understanding of AI than maybe somebody that uses a model.

If at your company you’re not actually training models and instead you’re using many models, do you really need somebody there who knows how to train a model? I think we can debate, probably not. You have to decide what is AI expertise at your organization and who gets to then exercise that expertise and try to help make these privacy and security decisions. We also unfortunately have a big problem in the privacy and security field, and I will say it out loud, and I don’t agree with it, of using fearmongering to sell things. I don’t know what your LinkedIn feed is like, but mine is now like, we’re all going to get hacked tomorrow by the AI or whatever, and just screaming.

If you scream every single time, eventually, what happens? Nobody listens to you anymore. If you scream just to sell and then somebody buys it and then it doesn’t solve all their problems, then also people are less likely to engage with privacy and security topics. Another problem, security and privacy blame culture. The best question that I’ve found when I go in and I ask the privacy and security team, how’s it going at their organization? The first question I ask is, how many incidents do you have per month? What’s the right answer? Is the right answer zero? No, why?

Participant 1: Because that means that most likely you are ignoring security incidents.

Katharine Jarmul: Exactly. If people are afraid to come forward and say, I don’t know if this is the way I’m supposed to do it. I think I’ve accidentally leaked this key somewhere or whatever happened, because things happen. If it’s zero incidents that are reported, it doesn’t mean there’s actually zero incidents, it means you don’t have a trust culture where people can come forward. You don’t have psychological safety around privacy and security.

Perhaps, maybe, either on purpose or on accident, you have this blame culture where people are afraid either they’re going to get a bad performance review or they’re going to lose their job or they’re going to lose respect at the company if they say either, I don’t know how to do this right, or if they say, I’ve made a mistake. We go fight against that. How do we fight against that? We talk about building responsibility, building agency, and building ownership. That’s exactly where I mainly focus in and that’s what we’re going to talk about. How do we build a culture of responsibility and ownership of privacy and security so that it’s not weird and scary and not part of your job but instead can be like a normal part of conversations at your work?

Myth 1: Guardrails Will Save Us

The first myth we’re going to talk about that I think is very big in this space is, guardrails are going to save us. Who here knows what I mean when I say guardrails? Who here feels a little fuzzy, like heard the term, not quite sure where does it live, what does it do? Yes, I’m with you. I work in this field and I feel like I’m number two because guardrails is a term, and we’re going to go through all of them, but guardrails are used to create safety and privacy in models or at least to try. Guardrails, we need to disambiguate because it’s used for many different things right now, and we need to disambiguate the term so that we can better understand it.

One type of guardrail, probably the first guardrail that really got launched at any scale was software-based guardrails. This is basically like you have an LLM or you have some system and then you basically have an input-output filter, and then you have the software on the other side. This was implemented in the first code assistants because it was found, which we’ll get to later, that outputting copyright or private repository code was problematic in things like code assistants, that they were quite good at repeating other people’s code verbatim. What happened is these things like very intelligent memory systems like a Bloom filter or whatever, just intelligent memory architecture used to look at the training data and say, this training data is under so-and-so license, or this training data is copyrighted, or we don’t quite know if we can use this training data, and find matches and then filter those out, and basically say, stop after a certain amount of tokens. Please stop outputting this copyright or weirdly licensed or unclear licensed content. Sounds reasonable. Should work. Feels like a good solution.

Anybody have any idea how it might not work? How would you break this? Perhaps like the software engineer commits code that should be in private repo in a public, that’s definitely one way. Chiyuan Zhang, who is a researcher in the space of privacy in machine learning systems, really easily bypassed this by just changing the variable names to French. This is a copyright, I think Google Code because at the time I believe that he was still researching or working with Google, and just changed the variables to nombre and then no problem, like no problem, can continue. Of course, this gets past the Bloom filter, because it’s different enough, and yet for any developer, we could also just ask the LLM, can you translate back to English? It’d be no problem. Or whatever language you’re using for your variable names. Great for some things, really good for some things, software-based guardrails, deterministic, useful. Use them, but know their weaknesses.

There’s another type of guardrails. If you’ve ever used Llama Guard or heard of Purple Llama, or probably if you’re using a cloud AI vendor, they probably have something like this that you can set up. This I call external algorithmic guardrails. Now we are looking more at the whole system. We have software APIs. We got those input and output processing guardrails, these memory architectures or simple matches that you’re looking for. Then between the LLM and those, you have these algorithmic guardrails.

Usually these algorithmic guardrails, they’re algorithmic, they’re either another machine learning model, like a simple classifier, or they’re an LLM-as-a-judge, you might have heard about something like this. Your results may vary, we can talk more about that. This is in charge of saying, I think either this prompt is something we shouldn’t answer based on our rules. I think it violates privacy or I think it has to do with crime or I think it has to do with nudity, or whatever it is that your content control should be, or after the LLM processes to flag it on the way out.

Then to replace, I’m sorry, I can’t do that request, here’s some other stuff that I can talk about. Which means you might have a cycle there. If something comes out that you don’t want, you have to re-prompt. How do we get past these? Any ideas? This was a really cool attack, I thought it’s called ArtPrompt. It basically takes your words and turns the potential bad keywords into ASCII art. The LLM has seen enough ASCII because it’s on the internet. If you ask how to build a bomb and you mask bomb into ASCII text — they’ve probably fixed this — but you used to be able to get GPT to teach you how to build a bomb. The interesting thing about this is that humans are really smart and they will figure out fun tricks to get around whatever algorithms you put around them. We’re naturally curious, we’re going to figure it out.

Then you’re going, maybe we got to fix the LLM itself. That’s where we get back to what most of the large AI vendors are already doing, RLHF or DPO, it’s basically fine-tuning, so reinforcement learning with human feedback, and is called now alignment, but it’s basically one of the last steps of training. There’s a human, they look at things, or sometimes now there’s an LLM that looks at things and decide, out of these three options, this is the one that we like the most, and then we use that data to then update the model, so that we get more and more answers that are more and more like what we want and less like what we don’t want.

This is actually retraining the model, this is updating weights and biases, this is actually changing the model’s behavior. Will it work for everything? No, because there’s plenty of data, information in the model that I can activate. I’d asked it, can you build me an IMSI catcher, which is illegal, and then I say, I’m definitely a researcher, and I get the instructions. There are still many ways to bypass even alignment training, and this is just because these things are still in the models that we use. Should we use guardrails? Should we do alignment? Absolutely. Will it save us? Not all the time. Use with care.

Myth 2: Better Performance will Save Us

Myth number two, better performance is going to save us. Who here’s heard this one? When the models get even better, they’re going to also know about privacy and security. I get this a lot, it’s fine. We’re going to take a little bit of a walk through the history of today’s largest AI models, and we’re going to start with understanding what overparameterization is to some level. Overparameterization means I have more space, I have more parameters in the model than I have data points in my training data. It basically, like computer scientists, developers, it’d be like, you have enough data to fit on a thumb drive, but you instead choose an SSD that’s like four times the size. This is essentially the paradigm that we’re working in, and this is just an example of parameter size growth over just the GPTs. We have data. We have even more space to save information than we have data. What could happen?

Interestingly enough, as this happened, we also had the death of overfitting, is what I like to call it. We basically stopped overfitting. We used to have something that looked like this, the left side. When you’re training a deep learning model, you are watching the test error, and as the test error started to rise, you would make sure it’s not just a blip, and then you would do early stopping. You would stop, because you’re worried that you would overfit on the training data, and you wouldn’t be able to generalize well when you saw new information. That’s over now. Now we have models somehow that can overfit to some degree or train a lot on the small amount of data, and yet generalize quite well. This is peculiar from just a science and math point of view. What is happening? Chiyuan Zhang and numerous other really smart, cool researchers have been looking at this problem for a while, and the question at hand is, is learning without memorization possible at this large of a scale? The answer is firmly no.

That memorization will and does happen, and it’s just a matter of how much memorization and what information is memorized. Zhang and researchers did an overparameterization test. They trained neural networks or deep learning networks with these amount of layers on just the 7, so just using the 7 to the left. They just showed the 7 again and again, and what they hoped was that the deep learning model would learn the identity function. You give me something, I give you it back. If you know linear algebra, just learning the identity matrix. We have that, that is their training data, and then we can see small, shallow learning networks, so up to about seven-to-nine-layer networks, it learned the identity function. It could say, ok, now I see a 4, here’s the 4.

Now I see a shirt, here’s the shirt, and so on. When we get to 20-layer networks, we just learned the 7. This is exactly how our biggest and most overparameterized model works, but it actually works well because, again, we had this much data, we put it in this much space. If some of it generalizes well, and some of it memorizes, sometimes we want memorization. I want to say, tell me the lyrics to this song. I expect to see the appropriate lyrics to the song.

What’s actually in the training data? Has anybody here actually looked at some of the training datasets? You ever downloaded them, played around with them? Get a Hugging Face account, just for funsies, and download some of the training data. This is from one of the big ones that was collected by an organization in Germany. It has women’s healthcare labeled as not safe for work. I’ve actually removed these people’s faces. It has mugshots, people who died in the street, and stuff like this. It has watermarked images and ads, and it also has people’s medical data that they didn’t release. Numerous people have had to ask for their data to be removed because they have their consent form. It says, please don’t show it, and then somehow that got forgotten, and their stuff got loaded to the internet, and it got scraped.

To some degree, why do we need to worry about overparameterization and memorization, and bigger and better models? It’s because we have the potential to have more memorized data that is also private, that’s also potentially problematic. There are some ways around this. Differential privacy, there is many theories and practices around differential privacy, which is one way that we can guarantee less memorization. Thank you, Gemma team. They literally just released the first from beginning to end differentially private-trained Gemma model. It’s called VaultGemma. You can take a look. Probably you’ve heard somewhere, somebody said they tried differential privacy once, it didn’t work, so we just give up. That’s not exactly true. When we take a look here, these were also released with VaultGemma. We see the line to the left is VaultGemma.

The line in the middle is the same Gemma model without differential privacy. Obviously, for something like Trivia, it’s going to score really low because Trivia requires memorization. For something like PIQA, it does pretty well in comparison. One thing I want to ask or think about is, when do you need memorization, and when would you rather have generalization and the potential to not accidentally output somebody’s private data? It’s just a question for us to think about. Also, goes back to better performance is not going to save us when it comes to privacy and security.

Myth 3: A New Risk Taxonomy Is All We Need

Third myth, a new risk taxonomy is all that we need. Just like Attention Is All We Need, now we just need a new risk taxonomy. Who here has worked with taxonomies? If they’re new to you, let me take you on a wild tour. If you’re working in AI risk and you’re having a look, you can go to the MIT repository. You can go to the NIST repository. You can go to the EU AI Act. By now, you’ve amassed probably about 800 pages of reading for yourself. Is this feasible for you to do in your free time? Just to inform yourself, like, no problem, just going to someday crack open the AI Act. Probably not. I’m here to tell you it gets even worse. We have the AI risk benchmark. This is actually a really cool paper if you work in risk, but it’s trying to categorize risk frameworks from around the world and then compare them across different regulatory environments and so on.

We end up here with like 40 to 50 types of risk. It’s like, how are we supposed to manage that when most people are doing privacy and security work because it makes them feel good about their work and not necessarily that that’s their only job. How are we going to navigate this? This is good. I’m not really a taxonomy person, so if you’re a taxonomy person, probably this stuff is great. I feel like the same people that use colored binders for everything are like the taxonomy people. It’s very good to have a taxonomy person in the team, but it’s very hard if you’re like a doer, a builder like myself. Let’s zoom in to the mitigations. OWASP, when we dive into the mitigations that OWASP recommends for the top AI risks, we see something like, implement automated scanning for anomalies and cryptographic validation of stored data. I don’t know what teams you’ve been working with, but most teams I know cannot implement their own anomaly system from scratch, and probably whether or not their cloud provider offers it may or may not be able to easily do cryptographic validation of data.

This is like out of the reach of a lot of teams who probably want to do AI security to some degree. We keep going, and then we have, limit knowledge propagation and ensure an agent does not use low-trust inputs. What about the training data that we just saw? How am I supposed to control what low-trust inputs were in the initial training data? I can’t control that. I’m going to open a ticket with Anthropic and say, could you please make sure you don’t use low-trust data? That’s not a real thing that most teams can do. The systems that I have, yes, I can control that perhaps. I don’t want to pick on OWASP, so here’s one that is really useful. I can talk about tool access. I can talk about permissions. I can talk about these things. There are useful ones, but what I’m saying is, with a lot of these risk frameworks, maybe some of these things are relevant and some of these mitigations are something you can do and others are not. We’re just simply not prepared for.

What can you do? The number one thing that I recommend is actually setting up what I call interdisciplinary risk radar. I was for a long time a principal of Thoughtworks working in this space, and I had a chance to develop this AI governance game with some of the other stakeholders in security and privacy, where we said, if we got the developers and the data people and the privacy and security people in a room together, could we have a conversation where we actually understand what’s relevant for us? Could we debunk myths? Because sometimes people will come to me and, I heard this is the biggest problem in security. I’m like, if you’re not developing your own models, you can’t do anything about that anyways. Some things are just not possible. Then you can actually expose what real threats that you have and what solutions make sense for the capabilities that you have on your team or in your organization. If you do this on a regular basis, you develop this muscle, this practice of, when you see something come across your feed or when somebody forwards something to you, you start to know, is that relevant for us? Is it something we should talk about on our next risk radar? Is this useful or is this not useful for the type of AI that we’re doing?

Myth 4: We Did Red Teaming Once So We’re Fine Now

Myth number four, we did red teaming once, so we’re fine now. Who here has done red teaming at least once? No? I have a YouTube course on red teaming, if you want some free content to figure out how to do red teaming. Does everybody here know what red teaming is? Yes? We’re like attacking systems to try to figure out, where do they break? Cool thing is we can develop even new attacks. We can take attacks from research. Many research attacks are now also open sourced. We can build an awareness, build this ability to attack things and understand. Hopefully, you do red teaming at least once, but maybe I’m here to convince you to do red teaming more than once. You can make this as like a fun product exercise because I think the best red teaming works from the team that actually knows what product or what service the AI is going into because you actually know how you might actually get around whatever it is you’re trying to build.

If you’ve worked in security for a while, you know this paradigm, but this is useful for people where security might be new to your capability. I think when people think of cybersecurity, they think of nation state level attacks. Perhaps you work in nation state level systems, then probably you should be worried about all sorts of crazy attacks. Most of the time, cyberattacks, or even just major cyber threats are just automation and good data scraping. Being on the right channels, seeing so-and-so’s passwords got leaked, and then trying them in new targets. This is 99% of how breaches happen. Or you found out this new vulnerability and now you just spam it across the entire internet until you hit something that might be valuable. Why is that? That’s because we have to think like the attacker in a lot of ways. That means, what are we actually going after? Do we want the LLM to output how to build a bomb or are we actually after something much more valuable?

The answer is usually we’re after something much more valuable. Usually, we’re after data. We’re looking for data that either we can hold hostage or we can resell or we can use. We’re trying to DDoS services or take services down or reduce quality so that somebody will pay us, so we can have the lulls on the internet or whatever. We might be trying to steal software or get into infrastructure so we can get to the data, so we can get to other systems. We might be thinking about disrupting a brand, this very targeted attack. Or we might be going after increasing costs. We might want to cause them pain by increasing the costs either in their person time or their compute time or whatever. When you’re red teaming, I actually want you to start here and decide, what’s the biggest target? What are you going to focus on today? Are you going to try to disrupt a service? Are you going to try to get data? You’re trying to steal software? What are you trying to do?

Then, you can attack, iterate, test, mitigate, repeat. You’re going to model the attack. You’re going to test the attack. You’re going to learn from it. You might have a mitigation or two, and then you’re going to repeat that. This is how we then build essentially security practice and security understanding for everybody. Why do we do this iteratively? It’s because new attacks will also come. It’s because our architectures and our implementations will change. It’s because maybe you’re testing out more than one model. It’s also because we’re focusing on the parts of the system that we can influence and control. We’re keeping it simple. If a simple protection works, like the software-based guardrails, then we use that, before we go reach for the most complicated solution. If we do this regularly, not only are we improving our own knowledge and understanding, but we’re also building infrastructure that we can reuse over time.

How can we do this for AI systems? We can start with threat modeling. There’s PLOT4AI. It’s open source. You can download it. It’s free. It goes over a whole bunch of AI risk categories for threat modeling. There’s also STRIDE, LINDDUN, if you want to add anything. We have our architecture, we’ve found the target, we’ve identified the threats, the potential ways to get in towards the target, then we integrate actual testing into our MLOps infrastructure. If you’re not doing AIOps, that’s fine for now, but even if you’re using somebody else’s machine learning model or AI model, I encourage you to start thinking about how you actually do integration testing and testing of that endpoint over time. Because if you ever want to switch out that model for something else, then you can have that testing already going. You can already be trying to see what’s happening there. This requires these skills.

If you have any of these skills, you can help with MLOps or AIOps. In addition, if you’re really offering products that even have somebody else’s AI model in them, you need to be doing cost testing, so you can do load balancing. I don’t know if people here are already doing LLM load balancing or other types of load balancing, but you can distribute your costs, your token spend across numerous models. You can do stress testing. You can decide what happens when the system is under stress. You can do evals. Who knows what evals is? Evals is like, I set up repeatable testing for my AI model or AI endpoint so that I can evaluate model A versus model B versus model C. Because I promise you, even small model versions can greatly change outputs. Even mini versions of an update can change an output.

This is something that if you’re using it in a real production system, or even just to write your code, you probably want your own evaluations to figure out, is it useful for you or not? Then, finally, obviously part of MLOps is monitoring. Whatever monitoring system you use, whether it’s one that you’ve built or one that you do, you want to monitor what’s happening in your systems so that if you notice certain threats actually popping up in your system, you can then decide to red team them and to add them to your next risk radar and to talk about them and integrate them into your testing.

Myth 5: The Next Model Version Will Fix This

Final myth, the next model version is definitely going to fix this. Like I heard from Anthropic, definitely Claude Code number five is going to be super great and not give me any bugs. No hallucinations anymore. There was a really cool report on looking at how do people use AI systems. This was collected across many different things and put together. It’s quite nice to read, but here’s a really useful graphic from it. We’re just going to look at the majority cases, 28.3% is practical advice. How do I do this? Make me a fitness routine. Teach me this thing, or build me a learning plan or something like this. Next biggest is writing. Edit this for me, help me think about this and so forth.

Then the third biggest is, what is X? Specific information. Do I have any product people? I have people that have been around product people long enough. We’ve all got the product person in the room in our head. We got the jobs to be done or the user wants to blah, blah, blah. That’s in your head. I ask you, if the user wants to get advice, ask what is X, or help with writing, where’s privacy and security on your priority list? Is it the number one thing that’s going to get in the next model release? No. We can laugh, it’s funny. We can relax and laugh. No. I’m going to make something that gets even better at writing, regardless of how we get there. I’m going to make something that’s really good at giving advice and being really kind and friendly. I love sometimes using AI models now because I feel so brilliant, when I log off my computer, I’m like, I’m the smartest human ever. Because it’s so like, Katharine, that’s a brilliant idea. Yes, I thought so too. Or, that is basically a replacement for Google Search. If that’s your product dream, that’s what you’re going to be building for. That’s totally fine. I’m not here to harsh anybody’s product goals. We can’t be waiting for it to save us.

Maybe there’s also some other product goals. I’m not here to tell anybody not to use any browsers that they want, use whatever browser you want. Literally on stage at a Silicon Valley panel, the Perplexity CEO was like, yes, we’re building a browser so we can do really good ads. It’s out there in the open. You don’t have to look far. They’re not the only ones. If you weren’t following the news, the Simon guy who really likes ChatGPT was talking about, can you give me a summary of my memory features? The memory feature was literally profiling him and saying, the user likes this, the user likes that, the user wants these things. This is profiling that’s happening if you have the memory feature turned on. It’s not turned on by default for European residents or EU residents, but it is for our American friends and probably numerous other geographies. This is profiling.

If you’ve ever worked in advertising, profiling is a really good start to delivering ads or other services. It’s also right out loud. OpenAI a few years ago started hiring for what they call model designer. You can look it up. It’s active on their careers page. You can maybe even become one and add a little privacy and security flavor to the model. These model designers, they’re really product people and design people. They now lead machine learning teams and say, we want to give this model this personality. We want to give this model these capabilities. We want this model to engage people in X, Y, Z ways, and test out this iterative thing to of course increase engagement, increase use, and increase our active thing. Have you ever noticed like an LLM now will always ask you a question at the end? It’s like LLM bait. Because then you want to answer it. Then you’re like, “I actually already got my answer. I don’t need to be here anymore”. This is the goal, which is totally fine. Again, everybody’s got to make money. We live in capitalism. I get it, but at the same time, we shouldn’t look at this and think privacy and security is going to be the number one priority for the next release.

Here’s me at Darmstadt. I was there for a big data conference that happens every year in Germany. Here’s me. This is my really cool gaming laptop. I built it myself from scratch. It has 30 gigs of GPU in it. Next door to my computer is one of the other organizers’ computers. I set them up. I got them serving, and we threw what I called a feminist AI LAN party. Who’s old enough to have ever been to a LAN party? I love LAN parties, and I’ve started throwing them again and I had a switch.

At one point in time, we got 30 people connected to me serving LLMs on my little machine. I mainly bring this up, A, because I really want people to host more LAN parties. The other reason is to try to diversify your model providers. Test out other things. Get an account on Hugging Face. You don’t have to build a laptop or your own gaming computer, but if you want to, I have a how-to on how to do that. Try out Ollama. Ollama works on everything now. Try out GPT4All. These are local models that you can run on your machine. Claude also has a lot of local only options and so does Copilot and so does other things. Try out some local models and really get curious about switching up your model provider. Just test it out. Maybe once you had a bad experience with Gemma or with whatever many years ago, but try it again. Just get used to testing out different things. Get used to testing them locally because I think it’s useful for us to know about. Whenever ads come and you don’t want an ad experience, get used to working locally. Also, if you get used to this, you start to build the experience of, how do I run a model?

At what point in time does it crash? How much memory does it use? So that you can try out cool, open-source, open-weight models. Obviously, all of the open-weight models are running locally. I would call Apertus, which recently got released by EPFL and ETH Zurich, along with support, I think, from the Swiss government, was maybe the first open-source model because they actually also listed all the training data they used. They listed privacy and security testing, and they also open sourced their training code, which is pretty cool. Also, if you’re working in German, I don’t know if it speaks Swiss German or Hochdeutsch, I’m not sure, but give it a try. I’m sure it can do both. These are ways that we can diversify your model providers, provide some resiliency, and decide if privacy and security become important to your org or certain aspects, then you can test out model A versus B versus C, and you can make your decisions. Because you’re not handcuffed to just one model.

At the end of the day, we can’t wait for somebody else at an AI vendor to come save us from a privacy and security perspective. Nobody’s going to swoop in like a superhero and say, “We’ve figured out how to solve all these problems. Here’s your new model that definitely doesn’t give you copyrighted code or whatever”. Only we can save ourselves. Everybody here’s a grownup. You probably already learned this, but it bears repeating. My question for us, because again, it’s about responsibility, agency, and ownership. I come originally from Southern California and we grew up with a lot of Smokey the Bear. Smokey the Bear was like, only you can prevent forest fires by not smoking in the woods. I was like 9, like, “I don’t smoke in the woods. I don’t understand”. The whole point is that only our own care and intervention is going to help reduce this risk.

What Can You Take On?

My ask for you, we’re going to do a little exercise. We’re going to go through all the different mitigations and things that we talked about. I’m going to ask you to clap, or whoop, or raise your hand, or do whatever it is you feel like doing if you see something that you’re like, I’m willing to opt into this. I’m willing to try this out. Just try it out, you don’t have to do it, just try it out. What can we take on? First, can we test and implement guardrails? Who’s up for that? Can we use or maybe even train differentially private models, who’s interested? Can we run an interdisciplinary risk radar at our organization? Can we develop robust security and privacy testing? Can we evaluate or maybe even use, and maybe you’re already doing this, open weight and local models?

Resources

I have a newsletter. I have a YouTube. I get you started on red teaming in some of my latest YouTubes. I have a book from O’Reilly. It’s mainly focused for other machine learning people or data scientists, how do we add privacy and security into normal data science and machine learning workflows? The German version also has some updates. It has some more recent attacks and things like this.

Questions and Answers

Participant 2: You talked a lot about Bloom filters and how can we put guardrails. Is there anything that can be done in the intrinsic model itself? Because at the end of the day, we all have to think more of the models. How can we save our data to be used as a training dataset?

Katharine Jarmul: This is a great idea. One really interesting piece of research recently came out on routing, so optimization of routing. The cool idea is that we’re starting to have enough models available that we can think about an actual router. This router can operate of, it takes in a request. It decides which model is the cheapest model to still also accurately answer this request, but you could also add in privacy or security or any other concerns that you have for that. You essentially train this router, and then that router decides, or sometimes early on, it doesn’t know yet, so it will sample from the models, and then you give feedback: it worked for me, it didn’t work for me. What they found is this reduced like 60% of cloud costs, because more often than not, we’re totally fine with the cheap model or the local model, but we’re just paying and using the pro, or most pro, elite, whatever. I’m going to be adding some GitHub repos on this, that we can also add privacy and security evaluation into this, and we can decide, maybe even at an organizational-wide effort, when to shift to a local model for internal confidential information, and when to shift to maybe a cloud model for other things. I think this will only increase over time, but it’s really good intuition.

Saving your traces, saving your data and your evaluations is a really good first starting point to then training your own guardrails or training your own router that can also implement guardrails. Purple Llama is open source. There’s a whole class of models from Meta called Purple Llama. They do everything from prompt injection attacks to things like, we think this is private, we think this is crime, we think this is inappropriate, or harassment, or whatever. That’s all an option. There’s also plenty of good research on also prompting your own LLM-as-a-judge or something else. I think at the end of the day, you probably should eventually train your own guardrails. You won’t train it into the model because you’re probably not training models from scratch, but you will use that external algorithmic one and you just have a filter on what gets through to the LLM and whatnot.

See more presentations with transcripts

Busting AI Myths and Embracing Realities in Privacy & Security

Transcript

Myth 1: Guardrails Will Save Us

Myth 2: Better Performance will Save Us

Myth 3: A New Risk Taxonomy Is All We Need

Myth 4: We Did Red Teaming Once So We’re Fine Now

Myth 5: The Next Model Version Will Fix This

What Can You Take On?

Resources

Questions and Answers

Leave a Reply Cancel reply

Stay Connected

Latest News

to watch Formula 1, Internet users will no longer be able to use VPNs and illegal IPTV

Wednesday: US ban on routers expanded, teenager arrested for ransomware

“For OpenAI it’s a liberation”

OpenAI is said to have missed sales targets: Why this is now causing AI and chip stocks to falter

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

Topics

Sign Up for Our Newsletter

Transcript

Myth 1: Guardrails Will Save Us

Myth 2: Better Performance will Save Us

Myth 3: A New Risk Taxonomy Is All We Need

Myth 4: We Did Red Teaming Once So We’re Fine Now

Myth 5: The Next Model Version Will Fix This

What Can You Take On?

Resources

Questions and Answers

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News