The Data Stack Show

208: The Intersection of AI Safety and Innovation: Insights from Soheil Koushan on LLMs, Vision, and Responsible AI Development

This week on The Data Stack Show, Eric and John welcome Soheil Koushan, a member of the Technical Staff at Anthropic. During the conversation, Soheil discusses his journey from self-driving technology to his current work at Anthropic, focusing on AI safety and cloud capabilities. The conversation also explores the allure of machine learning, the challenges of ensuring AI safety, and the dynamic between research and product development. The group emphasizes the importance of responsible AI development and the complexities of defining safety across different cultures while also highlighting the transformative potential of AI and the need for ongoing dialogue about its implications, and so much more.

Broadcast on:: 25 Sep 2024
Audio Format:: other

Highlights from this week’s conversation include:

Suheil’s Background and Journey in AI (0:40)
Anthropic's Philosophy on Safety (1:21)
Key Moments in AI Discovery (2:52)
Computer Vision Applications (4:42)
Magic vs. Reality in AI (7:35)
Product Development at Anthropic (12:57)
Tension Between Research and Product (14:36)
Safety as a Capability (17:33)
Community Notes and Democracy in AI (20:41)
Expert Panels for Safety (21:38)
Post-Training Data Quality (23:32)
User Data and Privacy (25:32)
Test Time Compute Paradigm (30:54)
The Future of AI Interfaces (36:04)
Advancements in Computer Vision (38:46)
The Role of AGI in AI Development (41:52)
Final Thoughts and Takeaways (43:07)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

[MUSIC] >> Hi, I'm Eric Dots. >> I'm John Wessel. >> Welcome to The Data Stack Show. >> The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. >> Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. [MUSIC] >> Welcome back to the show. We are here with Sohail Kushan from Anthropic. Sohail, we're so excited to chat with you. Thanks for giving us some time. >> Yeah, of course. I'm really excited to be here. >> All right, we'll give this just a brief background. >> So I started working in AI in 2018. Self-driving was my first gig. I worked on five years at a self-driving trucking company called Embark. Trying to make big rigs to drive themselves, and then continued my work in AI by joining Anthropic. I joined earlier this year, and I've been working on making Claude better for various use cases, especially in the knowledge work domain. >> So, Sohail, we talked about so many topics before the show. It is hard to pick a favorite, but I'm really excited about talking use cases, and talking about common mistakes people make when interacting with LLMs. What are some topics you're excited about? >> Yeah, I think I'd love to just talk a bit about where I think things are going from here. >> Awesome. Let's dig in. >> So, Hill, what interested you about machine learning? I mean, that was something that you wanted to explore. You considered graduate school, you ended up joining a self-driving startup. But of all the different things you could have done in the technical or data domain, machine learning, drew your attention. What kind of specific things were attractive? >> Yeah, I might be oversimplifying it, but to me, it felt like magic, like seeing some of the early vision models do things that I as an engineer, as a software engineer, as a technical person, had no idea was possible and had no way of explaining how they worked. Like anytime something is cool and you don't know how it works, it's like indistinguishable from magic. Like there's a quote that goes along those lines. And with the realization that, wait, I could do this. I could be a magician, I could build this, I could figure out how it works. So, I think it was just a shock around what I would have previously thought was impossible. >> Yeah. Do you remember maybe one of the specific moments when you saw a vision model do something? And that was, they're probably multiple, but one of the moments where you said, okay, this is totally different. >> Yeah, I think for vision, it was probably like the early bounding box detectors. Like, I remember playing around with classical, like heuristic ways of trying to like understand what's in an image using OpenCV, and then like seeing the first like real deep learning-based bounding box of textures that can also track objects over time. After having played around with like algorithmic computer vision and then seeing, oh, whoa, this is like way better. It's able to work in a variety of conditions, different angles was like really cool. And I had a very similar moment in LLMs. Like, I remember seeing the, I think it was GPT-2 or three blog posts that had Alice in Wonderland, maybe like the whole book or maybe a chapter. And it was like recursively summarizing it to distill it from 30 pages to 10 pages, and then from 10 to sort of five, or sort of from 10 to like five, and then eventually one paragraph. That was one of those holy crap moments for me, where it is, it's like... It requires an actual like deep amount of understanding of the content to be able to summarize and to then be able to do it with like new phrasing, new ways of like rewriting. The story was a leap that I had like never seen before up until that point. So those are like two key moments that I remember. So we're going to spend a bunch of time talking LLMs and AI, but I have to ask on the computer vision side that that, I mean self-driving still like very, you know, present in the news and stuff. But computer vision in general, like, you know, I think LLMs have really like taken over kind of the press. What are some computer vision applications that you think people like don't know about or some like really neat things that like maybe wouldn't show up for an average person? Yeah, so I hate to use self-driving because it's probably overdone, but also probably the average person doesn't know that if you come to San Francisco, you can take a self-driving vehicle anywhere around the city completely autonomously and you can download the Waymo app and do it today. There's so much work that's gone into that like over 10 plus years of engineering. And I think it's definitely still the coolest application of computer vision. I do think that even the longer term horizon like VR will probably be a very interesting application of computer vision like I saw. Metas latest release segment, anything to I think was the model that they shared. And it is a transformer based model that essentially allows you to like pick any arbitrary object and have it segmented and be able to like understand the semantics of, okay, this is an object versus background, but also be able to track that over time in a way that like is was extremely robust, especially if once the frame goes out of the scene and comes right back. So there's like so many cool applications in VR and I think the technology is advancing pretty quickly. And you know, maybe even just actually going stepping back from VR, like people are working on humanoid robots. I think that's a whole topic of like worth discussion and I don't have actually strong opinions on it. But a humanoid robot would require the level of computer vision, to understand that goes actually beyond what cars are able to do today. So that's I think another area where vision will like become really important. Yeah, and it's always fascinating to me, right? Where like a lot of times like you see the advances like or the threshold of like super usefulness, like post like it's like so I said, let's say everybody kind of moves on to like humanoid robots. And then all of a sudden like cars finally hit like, oh, wow, like this is like we're here, but everybody else has kind of moved on. And to a point, like I think that happens because like, you know, if you can get it right for robots, which is even harder, harder goal, like you can you solve some of those like downstream problems that I needed for like that last 5% for cars or for trucks. Yeah, and like, you know, there are real applications of computer vision today, like in manufacturing and sort of factories, like there's robots that do a lot. And a lot of them have really advanced heading edge like computer vision going on. So beyond just sort of like the futuristic use cases, there's a lot of really cool use cases today. OK, so you saw, you know, early major advances in computer vision and then LLMs and it was magical. But now you're behind the curtain or have been behind the curtain. Does it still feel as magical or do you feel like a magician? That's a great question. Yeah, so one of the things that's kind of surprising is that when I was working on self-driving, I kept being a bit of the pessimist. Like, I was like, hey, I think this will take longer than people are saying. I think that we're being a little bit optimistic. There are so many situations where I can fail. The level of reliability we need is so high that it's like longer away than people think. And I don't feel that way about like LLMs and transformers. I actually feel like the hype is actually warranted. And in both situations, I was like behind the curtains, right? And I don't think, yeah, my only takeaway is that I do think that this is real. And I do think that I do think that this is like magic that we're building. And I do think that it will progress really rapidly. And yeah, I'm super excited to be a part of it. And I do think like anthropics founding makes a lot of sense when you are aware of just like the rapid pace of progress in the states. I wanted to actually dig into that. And I've really enjoyed consuming anthropics literature because I think number one, the clear articulation of incredibly deep concepts is absolutely outstanding. But two, I think you address a lot of really serious concerns around AI and the future. And specifically, like any technology, what happens if it's used in ways that are really damaging. And so I just love to dig into that and hear from someone inside of anthropic. And maybe one thing that we could maybe start out with is just to, I think this is something a lot of people talk about, especially when, if you think about your average person, right? They're not deeply aware of the inner workings of an LLM or transformers or other components of this. And so what are the dangers? How do you think about like what are the real dangers that anthropic that makes safety such a core component of the way that you're approaching the problem and the research? Yeah, I think my mental framing of it is that this is like incredibly powerful technology. And incredibly powerful technology can be used for, you know, good or for harm. And this is true for all kinds of technological innovations that we've made, right? Like social media can be used for good or for harm, right? The Internet has obviously been 99% good, but can also be used for harm. But I think, you know, the current pace of AI progress is showing us that the technology is super, super powerful. And I think, you know, I try to put myself into the mindset of like the anthropic founders, right? So they're part of opening eye. They're working on research there. I think Daria was like head of research or VP of research at opening eye. And they're seeing the progress that's being made from a GPT one to two to three. And they're like, okay, this is going to be huge. Like, this is one of the most powerful technologies that humanity has ever created. It's very possible that in a few years we'll have like super intelligence, like, we need to like think about this seriously, like, this is pretty serious stuff. We need to like think about the implications of this, right? And so, and the other thing about a sort of AI and the current sort of like technology is that like, it's kind of inevitable. Like, even if, you know, opening eye were to suddenly stop building it, like other people will build it and it will. So it's almost a necessity that someone is taking like a good, hard, serious look as to the implications of what we're building in a way that's maybe a bit more serious than back in social media days where, you know, it was like, this looks fun. Let's just build it and like not really sink through the implications of this technology. So that's kind of the anthropic like mission. I think it's basically to ensure that the world safely makes a transition through a transformative AI, like the first one where the AI is happening, it'll very likely be built at one of these three labs. But what's most important is that the transition that humanity's making goes well. Is that the world, you know, ends up being in a better place in the end. So that's kind of the mission. And I think everything that a anthropic does is connected to that mission. And so like, like doing interpretive research, doing safety research, doing capabilities research, doing building products are all in service of this, like, bigger goal. Yeah. How does the product piece play into that specifically? Because you, it's an interesting approach, right? Because usually, um, product is sort of comes sequentially after research, right? If you think about in academia, right, you have a bunch of research that's done and it's like, okay, well, we could build a company around this or a product around this. And those things are happening very simultaneously at anthropic, at a very high sort of, I guess level or pace. I'd love to just know how that works and why. Yeah. I mean, I think product is incredibly important and anthropic is investing heavily into it. You know, we hired the co-founder of Instagram and the former CTO, Mike to sort of lead our product work here. And I think it's important for a few reasons, like one, having your technology into the hands of like millions of people is really helpful for understanding it, for figuring out the dynamics of how do people use this thing when it's out there? In what ways does that work and what ways is it not? Because again, if the goal is to make this useful for humanity, it should be interfacing with humanity. We should figure out how humanity is going to be interfacing with it so we can learn and make it better and make it maybe more steerable. We forgot what people care about and don't care about and actually feeds back into our research. Right. So that part is super important. It's also super important as a business like anthropic needs to have a thriving business. It needs to be a serious player from a financial perspective to be able to have a seat at the table, whether that's the, in a space of government, in a space of having sort of investors invest in anthropics that would continue our work. And so I think those two together make it so the product is like very important for us. Is there a tension in the company between the research side and the product side? And when I say tension, I don't necessarily mean in a like challenging way, although I'm sure that there are some challenges, but is there a healthy tension there in terms of balancing both of those in just the way that the company operates? Is the outcomes and the way that you would measure success historically are tend to be very different? Yeah. I actually think that it is very healthy here at anthropic, like specifically research breakthroughs, create space for new, incredible products and relaying that all the time to the product folks is super valuable. And then the inverse is also true where, hey, we have this product, but it's really lacking in these specific ways. These can feed back into research to figure out, well, why can't cloud do this? How can we make it better at this? And so this like constant back and forth between product and research is I think really key to like building long lasting and sort of, you know, useful products. Like artifacts is, you know, on the surface, just a UI enhancement. Like, you know, you could recreate artifacts in other places too. But because of this like constant back and forth between research and product, we're able to like come up with paradigms, figure out things that work and don't work and ship them and create like really meaningful value for people in a way that, you know, I think you're not seeing as much innovation when it comes to this. Like in industry broadly, you're just especially seeing it at startups, like I think startups in particular come up with really good ideas, but I think at like the biggest companies, everyone's kind of working on the same thing. So yeah, I do think that sort of interplays really important. And another one is just like, well, what about safety and product, right? Or what about safety and research? Like, how does that play into sort of like other tensions there? And I think one thing that's really helpful there is the responsible scaling policy that we have, which basically sets like the rules as to what kind of models are we willing to ship and how do we test them for the things that we care about? Like does this model make it easier to create bio weapons or not? And if that's the case, then we will not ship it regardless of whether we have like really cool product breakthroughs that will go on top of it. And it kind of becomes like the gold posts and sets the stage. And as long as we all agree on the RSV, the need for one, and then also just some degree the details of it, then there's no, you know, then you can debate the RSV and hey, are we being too strict or are we being not? But the decision about whether to not launch something is just about does it sort of fit with the RSP or not? Like it's not like I want to ship versus you want to ship. It's like, does it, it's like an objective question of whether it sort of fits within the RSP or not. So that's like a really cool tool we have to be able to scale responsibly and like make sure that everyone's aligned and on the same page about it. Another note I have on this is I kind of view safety as a capability. Like we talk often about this idea of race to the top. So if we're able to build models that are less jailbreakable, that are more steerable and follow instructions better and don't cause harm for people, that then creates incentive for everyone else to match us in that capability. And these are capabilities people will be willing to pay for a model that doesn't act like his customer support bot that doesn't accidentally say he root things to the customer or accidentally like, you know, make decisions that it shouldn't and is really good at instruction following. Those are like capabilities. It's not jailbreakable. You can't convince it to give you a discount, right? Those are things that are actually valuable for people. And so safety and capabilities, a lot of times are actually like combined. Like one thought experiment I have is if you truly had an incredibly capable model, then you could just tell it, hey, here's the Constitution. Here are the 10 things that humanity cares about, follow it. And then you're dead, you know, like, because it's so capable of understanding and like knowledge and like, it can think certain things really deeply, giving it the exact list of instructions that you wanted to follow. And then it can sort of be perfectly aligned to those, right? So that's a bit of a thought experiment, but I do think there's actually overlap between safety and capabilities. Yeah. I love that. I know you have questions about data, but I have one more question on this sort of safety and anthropics, you know, convictions and view of things. So we talked about a model that harms someone. And I think one of the really fascinating questions about developing AI models is that if you look around the world, the definition of harm in different cultures can vary. And so how do you think about developing something that where safety is a capability when there is some level of subjectivity in certain areas around these definitions that would define safety as a capability? Yeah. This is really hard. Like different cultures have different definitions of harm. And I think hopefully we get to a world where, to some degree, it is almost like democratically decided what we're trending these models to do and what we're asking them to behave like. I think for now, the best we can do is sort of come up with a common set that has the biggest overlap with the most places in the world and is like following all the rules and regulations that every place is decided on, so it's like the minimal set of overlaps. But in a future where we have like really easy to customize models, you could give it a system prompt and say, hey, actually in this country, it's a bit more okay to talk about this or in this country, it's not okay to talk about this in this way. And I think hopefully we can give people to the degree that is reasonable, the ability to steer the model to behave in a way that makes sense for their locality. Yep. Yeah. There are limits, of course. Sure. Sure. Yeah. I love that you just said this is really hard. I was like, yeah, that's sort of fundamental. Yeah, I think philosophers have been debating the roots of that question for millennia. Yeah, I think it kind of happened with Elon and Twitter, he bought it, he's all about free speech, and then he realized, okay, well, there's a reason like we have some level of fact checking and there's some level in our community notes is actually a very prominent teacher now. And like, I think as soon as you sort of think about it a little bit further, you realize that there's some level of democracy or community or sort of connection or sort of like alignment that needs to happen between groups of people, it's never like purely sort of like clear cut. Yeah. Yep. So on the data side, been excited about digging into this, obviously you have a ton of data that you use to train these models, ton of compute required. So huge large scale problem there. I want to talk about that, but I actually, some other things you said prompted this question in my mind, when you're talking about like, you know, we wouldn't want to ship a model where you could build a bio weapon. How do you get the right people in the room to know that would be possible? Because I don't know anything about bio weapons, like presumably you don't either. So like, like, let's start there with data, like, how do you even know like what you have? Do you kind of like a panel of experts that like span a bunch of different, you know, knowledge domains? Is that exactly that? So, you know, we have teams of people who are focused on exactly the sorts of questions. We like leverage external experts. We leverage government agencies and do all kinds of rigorous testing to understand, you know, risks around bio around nuclear around, you know, cyber security. It is really a panel of experts that contribute to making this to the students. Yeah. That's awesome. So on the technical side, like, tell us a little bit about that. Like, how does that look? Obviously, it's tons of data, you know, that goes into these training. What are some like scale problems, technology, type problems you guys have faced? Yeah. I mean, the scale of data is massive, right? Trillions and trillions of tokens, like dealing with the entire internet, text, you know, data, but also, you know, dealing with the tech. In many ways, dealing with the entire internet, yeah, it's not just a sort of a data storage issue with all kinds of other problems with internet data. And there's multimodal data now, obviously, right? Like, there's a lot of that. And that takes up significantly more amount of space and much harder to process and networking and all that. So the data challenges are massive. On the opposite side of that is, I do think there is a cognitive core that we need to like get to when it comes to building LLMs. Like right now, a lot of there, and this is something that like Kerpasi mentioned in like a podcast, maybe a week or two ago, like a lot of the parameters of these big models are going into like memorizing backs. And the like core common sense and cognitive capabilities can be distilled into like a smaller data set. And this is where I think bigger models can help train smaller models to tap them to like reason, to like know the basic information, like models that need to know every single thing that happened on Wikipedia, but they need to be able to like know how to look things up on Wikipedia, if that's like something that that needs to happen for a given user request. Right. So yeah, I think data is really important. But once we have really capable models, we can try to still be still like that back down into very very core cognitive capabilities that then can be used to, you know, create new data and have the models like run on their own and sort of learn from their own mistakes. And that can help like address like the data bottleneck too. So I'm curious how two things like on the use cases side of thing, how like how internally do you all use, you know, LLM technology and then and then as a follow up, like let's dig in a little bit too on like, how do you see people kind of outside of like your world using it and maybe what are some of the mistakes they make? Yeah. Personally, I use Cloud the most for coding. So I think most of my queries involve like, hey, like, you know, make this better or how do I do this? I miss language. But I think a lot of it is also just like general world knowledge like, Hey, like my drain is clogged, like what would be the right thing to use like things that would previously go into Google and then you'd have to like open some blog posts with like 16 ads and then at the bottom, it's like, okay, click bacon soda and vinegar, you know, like now it's just like making seven and eight, you know, it's a very direct. Yeah. Yeah. I had like some in every day. No, right. Yeah. That's like my one where it's like you have to scroll so far and there's like a recipe, like, like, you know, like 30 pages and it's like the very bottom. Yeah. Totally. Yeah. It starts like by explaining their life story and like exactly how you make this and it's like, okay, I just want to mix again if you like to teach me how to do that. Yeah. So it's just like for all kinds of common queries, like one fun example is I had a friend who is a teacher and he is a teacher and I reconnected with him after a long time and I told him I work in a prop and he was like, oh, yeah, like I use it all the time for like lesson planning. Like it takes care of so much of that. I'm like, hey, today I want to do like a lesson about X to me like kind of with some ideas and then make like some homework assignments and it can like, you know, it's like he said is like does a great job at that. So there's all kinds of like, you know, things in the context of work that are super helpful. I use it a lot to just like do question and answering to like, instead of like reading some long thing, I'll just like take it, I'll throw it into cloth and be like, hey, like this is the specific thing I'm looking for. Is it in here? Can you answer it? And that's like a big time saver. So I think probably, you know, I should probably talk to like more average consumers to understand where they use LOMs, but I think most people aren't aware, like probably the average person in the world is like, has never heard of entropic. And probably the average person in the States hasn't really used LOMs to their maximum potential. And so I think it'd be a relationship to figure out where like the discrepancy is and like, you know, where people are not aware of how like LOMs can make their lives easy. Because I think it's easy to be in an echo chamber like San Francisco and like assume that everyone's using it exactly the way that you are. But I think that's probably like very far from reality. Yeah. So on the on the prompting side, just want to ask that there's a, I mean, there's a lot out there about, you know, people have done some pretty wild, you know, things with prompting and created, you know, personas and all that kind of stuff. I'm curious from your perspective, like what, like, what do you think the most helpful just broadly, like things you can do when you're trying to get the best answers out of an LLM when you're interacting with it. Yeah. So we actually have this tool called the meta-prompter where you tell it, Hey, I'm trying to do this. Can you help me like write a prompt and it'll like you recursively work on the prompt with you to make the prompt better and best suited for an LLM. So that's like an example of a tool that I think can help people like do prompt engineering actually. I think there aren't honestly, there aren't like very specific tips that I have when it comes to prompting. I think like using that tool can help you like see examples of, oh, this is what like a good prompt looks like versus a bad prompt. But I think in general, like making what you're saying, easy to follow and having examples is probably, you know, and advice you would give to any person trying to explain something. But I think it is like especially true in a context of a lens, just like examples in particular really help models like figure out what you're trying to do. Yeah. The, what I've seen, which I think relates to that is, is it seems like, especially like technical people like want to program it, right, want to be like, okay, well, how do I prompt? It's like, all right. And then I've seen some very complicated, like, oh, I know an engineer wrote this type of prompts. And you know, it's relatively hard to benchmark that versus like a more simple prompt. But I've also seen some very simple prompts that seem to have like pretty similar outputs. Is that like your general experience too, where like there's some really complicated stuff and some simple stuff and maybe the gap isn't very big, you know, between the two? Yeah, I do think like as your instructions get bigger and bigger, like models today do struggle with like internalizing all of its and may start forgetting little pieces of it and it's not like perfect. And so yeah, if you can distill it into like the most key simple parts, I think that would generally be helpful. Yeah, I think one other maybe tip when it comes to prompting is to think of every token that the model has of the input and as output as compute units. And by, for example, telling the model to, hey, can you like explain my question and describe your understanding of it before answering it? Like what you're doing is two things like one is you're just giving the model more ability to compute like it is like, you know, every single forward pass creates causes some amount of competition to happen and you're giving it more of a chance to think. And I think that can be pretty helpful and like a very complicated question. But also you're giving it a chance for it to think out loud and put things out on the paper and every single time it puts down a token for the next token, they can look at what it wrote down previously. And so having the model be very explicit, think out loud, be descriptive and reason gives it just like it costs you more money, right, because it's like more tokens that have to get processed and it costs more like from a compute perspective, but that then can help make the model smarter and gives you like answers that are a better line. So, you know, I was putting together an eval one thing I added before my actual question was describe this document, figure out what are the relevant parts to it and then answer this question. And that sort of thing can help a lot. Yeah. And I guess we're entering this sort of paradigm now of test time compute where, you know, you can scale train time compute and you're trying to put more into the model, but you can also scale test time compute, which is having the model explain itself and think out loud and do chain of sides. And it turns out, you know, that that can scale pretty nicely with capabilities, especially for certain types of things like problem solving and mass and coding. And so that's like a lever that you can pull. You can use a bigger model or more compute one into training, or you can ask it to think out loud more and leverage test time compute to get like a better experience. Interesting. Yeah, that's interesting. That's actually a very helpful way to map your interactions to those different modes of compute. That's super interesting. Yeah. Well, we're going to close in on probably a topic that we could have started with and taken the entire show up with, which is looking towards the future. So one of the ways that I think would be fun to frame this, so we've just been talking about very natural, you know, day-to-day ways that, you know, we interact with, you know, with cloud, right? So how do I unclog my drain? You know, make this, you know, make this Python code better, explain my question. You know, all those sorts of things. But when we think about, I love the way the anthropic talks about the concept of frontier. And, you know, both in terms of research and product and models. And one thing that's really interesting about the way that most people interact with AI, at least two interesting things to me, one is that it is so consumer in nature in that there is, I guess, to put it in a very primitive, like a primitive analogy would be, it just feels so similar to, like, open Facebook Messenger, you know, or open cloud and the interface is very similar, you know, there are just so many ergonomics that are really similar. That's one way, which is very consumer and is ironically just not super different than a lot of interactions that have come previously. The other really interesting thing is that, in many ways, it's disappearing into existing products, right? So, increasingly, the products we use will have these new ways to use features or new features that feel extremely natural, but are like, whoa, that was like, that was really cool, right? And it's like, okay, well, there's something happening under the hood there. But it's so natural within the product experience that the AI component is sort of blending into the product in a way that isn't discernible, which actually, to your point earlier, you know, it's like, that felt kind of magical, right? And it's like, well, maybe that's the point. Those don't feel super different to the consumer, necessarily, right, or to the person interacting with it. It just feels like a more powerful version of things that we were doing before, and that may be that, you know, that's probably an understatement. When we think about the frontier and especially the research and all of the crazy things, the need for anthropic to say, could you create a bio-weapon with this feel so distant from the way that a lot of us interact? So that was a very long preamble, but how do we think about the future? Because it's kind of hard as a consumer to think about the power and the future of AI. I think because the way that we interact with it on a daily basis, almost obfuscates out a little bit. Yeah. I think part of the explanation for why people don't fully understand the safety implications is because maybe because we've, as an industry, done a pretty good job of doing RLHS and making sure that the models act in a reasonable, aligned way. Like I think if we throughout the base model that has no alignment work done on it, people would be like, whoa, this model just completely ripped into me and made me feel shitty. Or whoa, it just taught me how to do something that's pretty illegal. We've done a good job of preventing those sort of interactions. And so people are like, oh, they're super safe. Like they're super harmless. And it's like, great. That's exactly what we were talking to happen. Yeah. Yeah. Yeah. And this is just today, like as they become more and more capable, like it becomes a bigger problem. But yeah, I think that means that we've done a good job of just like aligning them, making sure that this sort of act in ways that people would expect and then are harmless. And I think, yeah, you know, on the point about like user interaction and whether it's like a specific app or destroying it to the product and just user interaction broadly, like, you know, I tell my parents that I, you know, work in AI or anthropic. And I think my mom was like, oh, man, like it's so scary, like things are changing so fast. I'm going to be so obsolete. Like I wouldn't even know how to use the future thing. And I'm like, actually, the future thing will be way easier to use than anything you've ever used in the past, like you will be able to talk to your computer. Like, you know, 40 years ago, you had to be an expert to use a computer. You had to like understand the command line and understand like exactly the command you need to use to like execute something very specific. Today you can literally talk to your phone and be like, Hey, how's the weather from on the trip that I'm going to next week in New York? And I'd be like, here is the weather. Like it becomes more and more natural and more and more human-like, which is actually going to increase accessibility and it's going to make all these things easier and easier to use. And I think there is a little bit of like jumping the gun a little bit where people, you know, this is where things are going, but if you kind of build it before it's ready, you end up with like lackluster product experiences, like, okay, like a, an AI for like creating slide decks. You know, like, this sounds cool. Like, let me explain the slide deck that I want. And it like does kind of a half-assed job and it doesn't really create exactly what you want. And that creates a bad user experience and then people are distressing a bit and don't use your product anymore. Like, there's definitely a certain level of capability that needs to exist for that feature to actually feel magical, to actually feel useful and to actually like, you know, not be frustrating to use. So, but once once those are there, interfaces will be very natural. They will be like the most natural human interfaces that we've ever had before. So yeah, I think a lot of it will be disappearing into the things that we use every day. Like your laptop will be completely like AI-based or AI-driven and like the way, you know, interact with your phone will be like that too. And some of it will create like full new modalities like, you know, one, one really cool idea I have is, you know, I think in five years, like, you know, maybe more or less, whatever. You can be like, okay, I'm trying to like install this shelf and I don't like slowly get it. And you would just pull out your phone and be like, yeah, this is the shelf. Like these are the instructions. And then you'll have this video avatar that pops up and talks to you and like has a virtual version of the shelf and says, okay, you see this part of the shelf, like, like drill this part and then you'll look at your thing and be like, oh, okay, I see. And this will be like generated on the fly and like, you know, you can't get more like sort of intuitive than that, like a literal like person in your phone explaining something with what you're seeing right outside of the phone. That sort of thing will I think very likely exist. So yeah, it's going to be like a crazy future. Wow. That's pretty wild, actually, it's putting desks together for the kids and you know, you get those things and you have like this little Alan wrench and it's like not fully like this sequence is like, you know, important if you get one thing wrong, you know, you start over practically. Yeah. Yeah. So let me know. Actually, yeah, you'll be the first to know. I'll let you know. Yeah. Yeah. So full circle, now, now I'm curious about you spent the time, you know, with computer vision now with LLMs are, and we talked about different applications for LLMs. I mean, chat to one everybody knows. Are there some cool things going on with computer vision type, you know, type technology and LLMs? I mean, I've seen some things like, but what are some things that that you see in the future for that? Yeah. So, like, Claude is multimodal, so you can take, you know, a picture of something, whether that's like some document you're looking at or, you know, something in the physical world and ask questions about it. And it's like particularly good at like explaining what it sees and going through it in a decent amount of detail. But the area that I'm most excited about is actually, you know, kind of away from what I was working on before, which is the natural world, like computer vision on images and actually vision on digital content. So a PDF, right, or like a screenshot of your computer or like a website. I think that as an input exists today, I think it'll get better and better. And then the related capabilities like, okay, you know, the first demo of, I think multimodal chat GPT was here's a sketch of a website and let you throw it in and take a picture and try to like write a code for that, like that will get better and better over time. And obviously there are multimodal output models like Dolly, right, where you can ask it to generate an image. There's now video with Sora and a bunch of other companies doing that. Audio output to with sort of voice mode that's coming and also Google has their own and there's a bunch of others like Moshi. So the three main modalities, text, audio and vision, and they can be at the input or output. And you know, in the case of Claude, you have texts and images, inputs, as well as Texas output, but this list will be continuing to expand over the future. And GPT 4.0 is actually a three modality and three modality out model. And I do think that's the future. I think especially vision in particular is useful. Like, I think audio, just a personal product take is I think very useful from a product perspective. Like, I don't think audio is adding new capabilities into the model, but it is a much richer, more human way to interact with it, whereas vision is truly a new capability. Like you cannot describe, you know, that table and the whole and where to drill it as sort of a text as you know, you could, but to be way, way harder than like, here's an image, like do this. Like, so I think vision actually does add new capabilities. And yeah, you're seeing a lot of that for like one of my focuses on sort of multimodal vision in the context of knowledge work. So how do you make Claude really good at like reading charts and graphs and being able to like answer the common questions you might have about like a report and stuff. So that's I think is super valuable. One thing I'll also just add on the prior like self driving work to like what I'm working on today is that like people talk about AGI, like I kind of think that AGI, depending on how you define it is already here, like these are general purpose models that can perform generally intelligent behavior. And it's about, it's more of a question of what data you feed in and where it's like when I was working on perception and vision, like it was a very narrow model, like it could do bounding boxes on cars and people and pedestrians and lights and stuff. But we were slowly starting to make it general, we were slowly starting to add other types of things that you wanted to detect, whereas like Claude and transformers and autoregressive transform in particular are general purpose thinkers, they're general purpose like next token predictors and so many things can be framed as a next token prediction problem. And so that's one of the things that I see that's different is about what I'm working on now versus before, whereas like I'm working on something very general, which is why audio just kind of works. You just, you know, discretize it, tokenize it and then throw it in and insert it with some tricks and with a bunch of things, you have the same engine that's creating text output, creating audio output. And I think that's like super cool in general. The same way that your brain is a general purpose, cognitive machine, there's been people who like have had different parts of their brain, like ablated and suddenly they can't do a specific skill or specific like type of kinematic motion. And then other parts of their brain reach configure and allow them to do that over time through retraining, especially if they're young and early, right? So there's tissue in here, but the general purpose system. And I think we've unlocked that. We have found an digital analog to a general purpose cognitive engine. And now it's just a matter of scaling it. It's the way that I feel. Wow. Well, Brooks is messaging us that we're at the buzzer, although I could continue to ask you questions for hours or perhaps days, but so hell, this has been so fun. I cannot believe we just talked for an hour. I feel like we just hit record, you know, five minutes ago. Really appreciate the time. It's been so wonderful for us. And I know it will be for our audience as well. Yeah, thanks for coming on the show. I'm really glad to hear that. Yeah, appreciate you guys. This is really fun. And I hope people get some value out of it. The data stack show is brought to you by Rutterstack, the warehouse native customer data platform. Rutterstack has purpose built to help data teams turn customer data into competitive advantage. Learn more at Rutterstack.com. (upbeat music)