The Data Stack Show

197: Deep Dive: How to Build AI Features and Why it is So Dang Hard with Barry McCardle of Hex

This week on The Data Stack Show, Eric and John chat with Barry McCardel from Hex. They delve into the technical, business, and human challenges of data work, emphasizing AI and data collaboration. The discussion also covers Hex's product updates, the complexities of building AI features, and AI's impact on data teams. The group explores the unpredictability of AI, the need for extensive evaluation, and the iterative process of refining AI models. The conversation wraps up by touching on industry consolidation, vendor competition, the dynamics of cloud platforms and open source technology, and more.

Duration:: 1h 3m
Broadcast on:: 10 Jul 2024
Audio Format:: mp3

Highlights from this week’s conversation include:

Overview of Hex and its Purpose (0:51)
Discussion on AI and Data Collaboration (1:42)
Product Updates in Hex (2:14)
Challenges of Building AI Features (13:29)
Magic Features and AI Context (15:22)
Chatbots and UI (17:31)
Benchmarking AI Models (19:06)
AI as a Judge Pattern (23:32)
Challenges in AI Development (25:31)
AI in Production and Product Integration (28:43)
Difficulties in AI Feature Prediction (33:38)
Deterministic template selection and AI model uncertainty (36:21)
Infrastructure for AI experimentation and evaluation (40:11)
Consolidation and competition in the data stack industry (42:27)
Data gravity, integration, and market dynamics (47:12)
Enterprise adoption and the bundling and unbundling of platforms (51:03)
The open source databases and the middle ground (53:18)
Building successful open source businesses (57:00)
The fun approach to product launch video (1:01:14)
Final thoughts and takeaways (01:03:15)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

(upbeat music) - Hi, I'm Eric Dots. - And I'm John Wessel. - Welcome to "The Data Stack Show." - "The Data Stack Show" is a podcast where we talk about the technical, business, and human challenges involved in data work. - Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. (upbeat music) - With one of my favorite guests we've had on Barry McCartill from Hex, it has been a while Barry since you've been on the show. How long ago were you on? - I don't know, a couple years ago maybe? - Yeah, that's crazy. That's been way too long. Well, for those who didn't catch the first episode, give us a quick overview of who you are and what Hex is. - Yeah, well, thanks for having me back on. Hex is a collaborative platform for data science and analytics. We built it largely just, it's an incredibly selfish company, really. It's just we built the product we always wish we had. I've been a builder and user of Data Tools, my whole career, and longest stint was at Palantir where I met my co-founders and a bunch of our team and got to sort of, you know, just build a lot of different data solutions for a lot of different data problems. And, you know, Hex is built to be this product. We kind of call it like this multi-modal product that is able to bring together our teams and workflows and sort of unite them in one place around being able to work with data in a much more flexible, fast, and integrated way than the tools that we had sort of struggled with before. We can go deeper into that, but that's the high level. - All right, so one of the topics I'm excited about talking about is AI and BI together. So let's make sure we get to that topic. What other topics do you want to cover? - We talk about a ton of things. I think there is a very interesting time in the data stack as we're recording this. The Snowflake and Databricks conferences for just over the last few weeks. So it is very interesting just being there and sort of a good chance to check in on where data is going. I think the AI topic is super interesting and rich, tons we could cover. - Yeah, yeah. Awesome. Well, let's do it. - Yeah, sounds good. - Okay, Barry, there's, I mean, a million things I want to cover, but can you give me, let's just say, I didn't even look this up, shame on me. I didn't even look at when we recorded our last episode, but let's just say it was a year and a half ago, which is probably directionally accurate. What are the like biggest product updates and hex in that time period? Can you go rapid fire? - Oh my gosh. (laughing) - Wow. - You guys ship so often that we said that was a very unfair question. - No, no, no, it's good, it's good. - At least give it a chance to pull up the release notes. (laughing) No, it's good. I, you know, I've got all my head somewhere in there. - Yeah. - Yeah. Look, the, I think longitudinally like, you know, we started very deliberately wanting to build a fast, flexible product for a data work that sort of filled this gap that we felt was really around, you know, you go into most companies and use to see a data stack, most companies at a certain size will have, you know, a sort of BI tool and three dashboard centric. - Yep. - And that's all well and good, but I think at every data team I've been on or been sort of exposed to, like 80, 90% of the data work was happening outside of that. You know, it's in some menagerie of like, you know, random SQL snippets and Python notebooks and spreadsheets and screenshots of charts and PDFs of decks sent in emails is like the way to communicate. - Yeah. - And I just felt that pain so viscerally was sort of built hex to help solve that. The long-term vision was, you know, increasingly unify and sort of integrate a lot of the workflows, the way we've already talked about internally is we wanna build that sort of front end of the data stack where you don't have to jump around between tools. And both as an individual, you can bring workflows together or your workflow can be more sensible because it's together, it makes it easier to collaborate as a team and then expose this to more people in the org. So we talked a year and a half ago, I think we were pretty far down that path, you know, just in terms of maybe the first era of that. - Yep. - I think people thought of hex and maybe still, largely do you think of hex as like a notebook product, you know, almost like a super flexible fast product for that. I think as we've expanded, we've grown, there's a bunch of things. So about a year and a half ago, we introduced our AI assist features. I think we were pretty early to that. We'll dig in a ton and sort of how we've done that and where we think that's going. But that's been awesome. They're called our magic features and that was a huge thing. We'll go deeper on that. We've built a whole suite of no-code tools in hex. I mentioned this word earlier, multi-modal. It's kind of a buzzword in the AI space right now. But it really just means like being able to mix these different modalities together. So, you know, in hex, for a long time, you've been able to mix like Python and SQL. Like that was actually like, it maybe sounds prosaic, but it's actually it's like pretty profound in terms of a lot of people's roles. - Definitely. - First people to be able to bring that together really fluidly. But since then we've integrated it just one more. So we have a whole suite of no-code cells like charts, pivots, filters, right back cells. And now, you know, a few weeks ago, we launched a bunch of stuff that's effectively being like spreadsheet functions into our table cells. So you can sort of work, but you can actually do a full end to end workflow and hex.now, fully no-code. We've built a bunch of tools for governance, whether it's data governance or sort of reviews, like get style reviews on projects, endorsements. I don't know, it's a long list. - Yeah. - But effectively, the focus has been how do we expand the number of people who can participate in these workflows while continuing to make the product more and more powerful for sort of that core data team. That's really at the center of the decisions companies are making. - Yup. - Love it. Yeah, I actually didn't, I had this thought when you were saying screenshots of screen dash of backboards. And I was like, how often is Slack? I literally a screenshot in Slack, like the main, that's like someone's main source of data is, - Oh yeah. - Literally just filming. - I think, it's one of those things where actually, I wonder, I think majority of data is probably consumed via static screenshots. - Yeah. - Like if you just back up and you think about like charts in decks, charts in Slack, charts pacing when you know, like it's gotta be. - Totally. - I think it's, you know, in some ways, that's a very old problem in some ways. - Yeah, yeah. - That's still like eminently unsolved. - Yeah, yeah, totally. - But I think there's actually a reason behind that. I think part of the reason is like, say it's for an executive, they want like a signature behind it, right? They want like, this is certified correct per X analyst as of this day, you know? Like they want some kind of like, yeah, you're. - Yeah, that's, I think it's actually a really interesting segue into some of the AI stuff because I think that, you know, you just said they want like a signature from an analyst on it, right? And it's like, yeah. - Yeah. - I think it's, I think it's something I've thought a lot about with like how AI is going to show up. I think they're like really reductive like, oh my gosh, I just saw GPT for the first time and I saw that it can write SQL is like, yeah, right. - Right. - It's going to do, it's going to be like an AI data analyst. I think there's a bunch of reasons that is not going to happen or it's not going to happen overnight. But my favorite sort of weird reason is like culpability. - Yeah, yeah. - I live in San Francisco and they're self-driving cars, like Waymo's all over the place. And I take them, I love them. But I think it's very interesting to me that they still live in this like regulatory purgatory, which is like every day all across America, like legions of eager, distractible 16 year olds get driver's licenses. - Yeah. - And like self-driving cars, it's like a multi-year slog to get them like approved. Even though like, any objective observer would look and be like, a Waymo is less likely to hit and kill someone than like a 16 year old. - Right, right. - But like, I think it's like a societal justice thing actually that's like, we know what to do with a 16 year old who like hits and kills someone. And it's like some version of like our justice system, the setup around that. - Yeah. - It's like, what do you do when a self-driving car inevitably just like, you know, they drive enough miles, there's going to be accidents. - Yeah. - What do you do with the self-driving car? - Totally. - Do you put it in like robot prison? Like, yeah, yeah. - And I bring it back to data, like, I think about this with like AI, like data analysts, AI data scientists, there's certainly a lot of companies that market themselves that way. It's like, if I'm like, hey, should we increase prices? And I'm like going to do an analysis on that. And I ask the AI to go analyze a bunch of data on that. And I get charts, like, my, who's behind, who's standing behind that analysis? If the price increase doesn't work or whatever the decision is, like, you know, I think with a human analyst or a human data scientist, like someone's credibility is on the line. And there's like a reporting chain of people whose credibility is like on the line. - Right, right. - That and, well, they may add like, what do you do? You like fire the bot? I guess you turn off whatever AI software you use. Like, yeah, it's just a funny thing. And I think there's a, I think our society's actually going to have to learn how to like develop this tolerance or like way to handle the defect rate or the inevitably inevitable like stochasticity of like these AI systems in a way we're just like not well-equipped to do today. - So AI capability in the context of like data and analytics, if you have an end user who's like engaging with data who is not an analyst, right? They don't know SQL, they don't know Python. - Yeah, the mythical business user. - Exactly, yes, yes. And this mythical business user in this like mythical company where everyone uses data to drive, you know, every decision that they're making. But do you see that creating a lot of extra work for data teams at companies where, you know, you sort of implement, 'cause I mean, there's, and maybe it's not even, maybe these things aren't even like fully real, right? As much as the marketing would lead you to believe. Like again-- - What are you saying the marketing is selling something out of? - Startup marketing is selling capabilities. 'Cause I went on this website and it says there's an AI data scientist who's telling you there's no-- (both laughing) - Like you just. - Is that going to create more work for data teams? Or is it like, I don't know, does that make sense? Is it a question? - Yeah, I know it does. - Like because, especially for critical stuff, if it's fuzzy, who cares, right? Like it's directionally accurate, that hallucination can be catastrophic in certain-- - Well, I think, there's a few things. In some ways, this is a really new topic with AI. In some ways, it's like a very old topic of like, self-serve generally, as a pattern, in sort of like the analytics market is like, okay, if you set a bunch of people loose, there's going to be a defect, right? Like, you know, it's going to be like, well, you didn't use that metric correctly. And that's where people talk about semantic layers and governance, and we've certainly done a bunch there is more we're going to do. But it's kind of the same. I think as it boils down effectively, this is the same problem with AI, which is like AI will also sometimes misuse the data. It's like, how much context can you get it? And, you know, whether it's semantic layers or like data catalogs or whatever, like the kind of pitch of those products, like pre-AI, is like, well, this is a place where you can put a bunch of context and guardrails around how humans can use this data. I think it's effectively the same pitch for, with AI, is like, how do you do that? And like, it's why we've built a bunch of those tools into our magic AI features with hex, which is like, you can incorporate, like automatically sync your DBT docs or your column descriptions from Snowflake or BigQuery, or you can enrich it within hex itself to give context to the model. But does that mean it's 100% accurate? No, when we do our own emails, our own experimentations, our own internal stuff, like it does dramatically increase the level of accuracy and it being able to use that, is it perfect? No, and so the question you're asking is like, when those defects happen, does that work on the data team? I think there's maybe some version of like equilibrium of like, you're saving them a bunch of time answering random QQs, and then they're having to do that. The way I see it is like, I think the best version of self-service of any type, whether it's AI augmented or knowledge, I really think is in some ways very profound in other ways, pretty incremental is like, you're taking a lot of, the best version of this is you're not replacing the data team. You're taking a bunch of the sort of like low-level stuff out of their Q, so they can focus on the really intensive deep dive stuff. And in many ways, you know, that self-serving, of course, I'm CEO of the company that builds a product and wanna sell to people, but like, it's really what Hux to do, which is like the deep dive work. We wanna be the best way for data teams to get to the complex answers. Yep. And I think it's an interesting question of like, does self-serve AI augmented or not create, give them more time for that? Or does it just create more work trying to cover and all of that? And that's, I actually think there's no one right answer. I think it comes down to a lot of like, the way team structure stuff, the style they want, what their level of comfort is on people sort of coming up with their own answers and different people are different from it. I don't think there's one right or wrong. Okay, I have a, John, I know you have a ton of questions that can we get like, can we get down and dirty in talking about how you built your magic features and maybe other stuff that you built? Because John and I have actually prototyped a bunch of different like LLM applications, AI apps, if you want to call it, and it's just way more difficult than I think, you know, than the startup marketing, you know, than the startup marketing. We talked before the show. There's this like typical like, hey, I got this working in development now, let's productionize it. There's that typical workflow. And with AI, I'd say it's order of magnitudes more between like, tables kind of works versus productionizing. Yeah. And more context is, this morning, I knew we had a podcast recording. I forgot that it was with you, but coincidentally, I literally hold up hex and I was just doing some experimentation on our own rudder sack data with the intention of using magic and playing around with it. And it was great. Like it was- - You're about to go that. - Yeah, I had a target table. I had like a set of questions and I was like, okay, I'm going to go like, I'm going to go fight with this AI thing to see, you know, to like man versus machine. - Then you won. I don't know, you both. - It was great. It was great. I mean, down to like the fifth decimal point on one thing, I was like, this is awesome. - I'm awesome. I'm really glad to hear that. - Yeah, it was very cool. But I also know that that is non-trivial from a product. Like it's so- - Yes. - It is in fact, like, I mean, we haven't talked much about this, John, but like, when you do that, it's so simple that it's almost an unhealthy level of obfuscation on what's actually happening under the hood to make it sound good. Which is part of the, you know, magic, I guess not to be too cheesy. - Yeah, that's right. - Yeah, I mean, well, again, thanks for saying that. We've put a lot of our work into that. So let's, yeah, we should have a very practical conversation with this 'cause I think that the big thing over the last couple of years is like, these models have been working. They've been working really well on doing a bunch of things. And it's easy to look at them and it's easy to set up a quick demo with the OpenAI API. The like, actually building a production AI system is a much different thing. It's turning out that it's really well suited to certain tasks and not others. I mean, there's a bunch of stuff here. I think that the pattern we're using under the hood is something I assume people, a lot of people are probably familiar with, which is like retrieval augmented generation. It's as far as we can tell 90 to percent, some really high number of AI apps today are using RAG as opposed to like going and spending a bunch of time on fine tuning. - Right. - And under the hood, we're using base models from OpenAI and anthropic and kind of built it to be pretty modular models-wise. And the magic really is in the context you're providing it. So it's one of those things like right place, right time. I didn't have like gen AI on my bingo card, but when we started the product. - What? - Like it turns out the like the product format of hex is actually pretty sweet for this, which is it's that sort of notebook UI that sort of the core UI that we built hex around. We're kind of big believers in that. We don't think of the product just as a notebook, but the UI pattern works really well. And it works really well for AI, which is that you kind of have a bunch of upstream context. I mean, you can just start de novo using AI to generate stuff and that's great. And the context we're gonna use there is context we have about your schemas. You know, column names and sure, but also we can go and look at things like all the column descriptions. We can increasingly we're tapping into other queries you've written and trying to pull up like more context based on your actual workspace. But, you know, as you're actually working through a project, it's a tremendous opportunity for us to sort of pull in all of that upstream context. If you've been querying into a certain table already or you've been using certain columns or you've been joining things in a certain way, that's all context we can provide the model. - And there's some really nice UI things we've done as well, like being able, and this is the other hard part about AI is like getting the UI right. I think there's a lot of, we're in this very chat bot centric phase of UI design right now or like the default way to build an AI product is like basically copy paste chat GPT. - Yeah, exactly. - I'm a believer and pretty insistent that is not gonna be like all of SaaS is not just gonna converge to chat bots. Interesting. In terms of like the UI. - Google had it right with their just single field. - It's all back to that, right? - Yeah. - But even, yeah, I don't know if you guys caught the like Apple intelligence stuff. I think it's actually really interesting that Apple's done a ton of JNI work and none of it is chat. It's like the other thoughtful ways to incorporate this in your product. Anyway, you know, even things like being able to app mention tables or data frames, there's a bunch of stuff that I just like helps give context and also helps give confidence to the user. Like I can instruct this thing. Like what are the interesting things designing the UX for AI? It's not just like an intuitive UX. There's a really subtle thing of helping the user form a world model of what the AI knows. And this sounds so anthropomorphized. - Yeah. - But I think when you're using an AI product, a lot of times you are kind of in the back of your mind kind of thinking like, what instruction can I give this? What is it? - Okay. - Right. - What does it know about? And I think being able to expose more of that, you mentioned obfuscating it. Eric, I agree. I actually think we want to expose more of that to users to help fill out that world model of like, what does this model know? What can it do? What can it do? How might you want to modify your instruction to get the type of response you want? - Yeah. - That's all really hard from a prompt perspective. It's also, I think, really tough to get brighter. A lot of hard work to get right from a UX perspective. - So I have a question. How do you benchmark? So all these models are changing all the time. So say that you're like, all right, we want to use the latest, you know, cloud model. How do you benchmark between models? I feel like that's a pretty difficult problem. - That's, it's an incredibly difficult problem. And it's actually, I'm glad you brought the exact example up 'cause we're literally doing that now. Testing up the newest cloud model. - Yeah, 3.5 or whatever. - Yeah, 3.5 summit. I think it's, but you know, you read the like announcements of the blog posts about these models, they all sound like they're gonna, you know, they're God's gift to me. - Yeah, that's great. - You know, the benchmarks look great and all that stuff. You have to benchmark it yourself. And this is a term that's called evals, which is kind of just a very specialized form of tests. And we've built a pretty extensive eval harness. So there's open source stuff. So there's like the Spyder SQL benchmarks, which is sort of an open source set of benchmarks around text to SQL. And then there's a lot of our own evals we've written as well. And, you know, for us, our magic tools don't just generate SQL, they generate Python code, they generate charts, they generate chains of cells. You know, you ask a certain question, you wanna get a SQL query and then a chart after that. And so we have a evals built for a pretty broad set of this. We've had a lot of internal mechanical jerking of like building these eval sets out. And what it lets us do is quickly run experiments in our experimentation framework. We've built internally called spell grounds, which is basically we can very quickly say, okay, great, I wanna test this new model out. Point spell grounds at that model, have it run a sample or all of the evals and then get back a result. So we actually see based on different types of generation tasks, whether it's SQL generation, Python generation, chart generation, change generation, text generation, whatever the task at hand is retrieval stuff. How good is it at those? And what's really interesting, even upgrading from a GPT4 turbo to four, you actually saw certain things, it was much faster, but you also saw tasks where it got better. You also see a toss out test where it got worse. Then you start thinking like, okay, do we have to do some prompt tuning for this? And you can give them these patient loops of like, okay, wow, it got worse at chart gen. Is there something about the model to do that? And it's just taking a step back. I mean, as somebody's been building software for a while, this all is so nuts, like it is such a mild and primitive way to program it. We were like increasingly used to it. Here I am on a podcast, like talking with authority about how we're doing emails and prompt. And then you actually look at what you're doing when you're looking at this shit. It's like, okay, I'm yelling at the model this way and I need to yell at it in a different way. I need to add another line that says, you will not respond with markdown. You're like, yeah, I'm talking at it like a society. - Yes, there's a child, you will not include Python in your response when I ask for a sequel. Like I'm like, it's very understanding of it. - Yeah, I'm certain stuff that it's like you're, like it's that scene from "Zoo Lander" where you're still getting brainwashed to go like kill the Prime Minister of Malaysia. It's like, you're like brainwashing these models. You're like, you're a world class analyst. - Totally. - So per smart. - It's like per smart. - Right, seeking ease for your queries. - Yeah, it's weird, man. And here we are building a bunch of like expertise and infrastructure and trade craft on how to do this. But I can't avoid the feeling of like, we're gonna look back on this whole era of using AI and being like, well, that was weird. - Yeah, right. - The point where I had a very similar, you know, you sort of step back and try to objectively look at what you're doing was when, do you remember we were working on a content generation project or something? - Yeah, right. - And John was hammering on this prompt, hammering on this prompt. And I was like, okay, this is crazy. And then it became like a multi-multiple steps of prompts to generate prompts, which was, I was like, this is like, that was the moment for me. It was like, okay, this is like, you're using an LLM to generate a prompt. - Oh yeah. To create a persona, to create a persona that can then generate like a better prompt. - We go even crazier, I'll take you another level of sort of craziness on this. - Like meta. - We have adopted a strategy. We'll publish a blog post about it, seeing for one of our AI operations around editing, which is an AI as a judge pattern, which is basically you have the LLM generate something. And then you have the LLM model call, look at that response and judge whether it's an accurately. - Like tree, what is that? Like tree of thought, is that what that's called? - I don't know, there's something for it. - There's chain of thought. - Which is also, it's different, but it's also a really interesting and weird thing, which is if you tell the model like think step by step, you get better reasoning abilities. And it's like, why? - Yeah, yeah, yeah, yeah. - It's like very spooky. And there's technical explanations for it. I listened to this talk by one of the guys who co-authored that paper, the original paper about that. And there's people who have theories on why this works, which is like by telling it that, it will actually spend more tokens, which basically makes it like think more, and you're kind of forcing it to break things down. And it's not that it actually has like reasoning per se, but it's by like forcing it to spend more tokens on the task, you're basically making it think more. And like, then that opens up this whole thing around different model architectures that beyond transformers, which are undergirding what we think of as large language models today, which are basically spend the same amount of thinking, reasoning, compute, whatever you however you want to frame it across the whole generation versus other model architectures that are sort of still in the R&D phase that can more selectively apply that. And so even you can think about, are there patterns down the road where you could even, as part of the prompt, tell it what parts to think carefully about or be able to steer its reasoning more. And it kind of gets in this very weird hall of mirrors, but we use this AI as a judge thing, which is separate, which is like a, almost like calling the model in again, to look at the work of another model called. Yeah, sure, yeah. And it gets really weird. And then it's like, what if the judge doesn't like it? It sends it back to do another generation along with comments from the judge about what it didn't like. And it's like kind of facilitating a conversation between like two total calls. It's like, wait, what do we do? You take a step back. Totally. For the entire history of software engineering, you're basically used to these very deterministic things. Yeah, totally. In fact, any non-determinism is like a bug. Like it's like, okay, I'm gonna write this test. Yeah. Right. It is a unit test when this function is called but these arguments should respond with the responses. Yeah. Every time, and if it doesn't do that, something's broken. And now it's like, well, you know, sometimes it dreams up new things. Yeah. Yeah, right. What is it? I had a moment like that. This isn't related to AI, but very similar feeling when I realized like, like, we're, you know, we have these models talking to each other. But I was taking my son somewhere, doctor's appointment or something, we get in the car, my phone automatically connects the Bluetooth. And I don't know whatever setting, but it just automatically starts playing whatever, you know, audio I had been playing. And so, of course, that morning at the gym, I had been listening to gong, you know, sales calls on gong at on 1.75. And so it starts playing. And these, you know, it's like rapid fire discussion. Use chipmunks. And yeah, in my son was like, are you on a call, daddy? And I was like, oh, no, like, you know, and I pause it. And he's like, oh, then what was that? And I was like, I was listening to a recording of other people that daddy works with on a call. And he was like, I mean, he would just pause for a second. And then he was like, did they talk that fast in real life? And I was like, no, I speed it up so that I can listen to like a lot of calls in a row. And so he was like, you are listening to other people that you work with on a call that you weren't on really fast. And I'm thinking, I was just like, yeah. And he's like, that is so weird. Yes, son, that's product marketing. That's how I know you're a product marketing. 'Cause you're listening to sped up gong calls at the gym. Yeah, yeah. You're a 10X product marketer, or 1.75. 1.75. And you make him a 10X product marketer. That's the key right there. I don't think the same thing where like you step back and like just like with an objective view of like looking at what you're doing, you're like, this is weird when you say it in English. Yeah, when you think about that, it's weird. Yeah, I mean, I agree. And that's, I think going back to the topic of like, getting things into prod, it's like, I think we learned really fast after a chat GPT that it's not hard to get cool AI demos working. Like you had like a couple YC batches in a row that were basically like, (laughs) (mumbles) (laughs) around GPT. It's much harder. And I think what's interesting to me is to look at some of these apps as they're growing up. And I'm fortunate to know the founders who are a bunch of these both within data, but outside as well. Oh, where is like an AI first thing, you know, where you're going to say, great, we're going to build this just a wrapper around GPT and then evolve. The thing versus not, you know, there's a bunch of these now that are like AI support. Yeah, yeah, companies, right? Like not within data, you know, customer support. Yeah, yeah, yeah. You get into it and you're like, yep, there's clearly a huge opportunity to change the way a customer support is done. But a lot of these companies may well have to rebuild a lot of what like the last generation of support companies did. It's not clear to me that these startups won't have to effectively rebuild intercom or send us. Right, yep. And, you know, it's a question of like, can the intercoms and zendesks get good at this faster than the startups can get good at that? And we can talk about that in the data space too. But what I've really come to learn and appreciate is how much, how hard it is to get this AI stuff working in prod and how much it is dependent on having the rest of the product. Because we could not get the quality of what we get with magic without having a lot of the other infrastructure on product we built. Not to mention millions and millions of lines of code. And I should go look at hundreds of thousands of projects that people have built in its over the years. We're not trained just to be very clear. We're not training these. There's no IP leakage issue. But even just having the product and the context that organizations have built up over time is really the thing that's worked well. So there's a bunch of AI specific trade craft, but then it's like, how are you incorporating in the rest of the platform? What is really important? - Did you, what have you tried that didn't work? You know, it was just like, we have to stop going down this road of productizing-- - There's a bunch of things. And a bunch of things, and I think one of the interesting things on those moments is you step back and you say, does this not work because of the models? And like maybe GPT-5 or six will be better at this? Does it not work because of, we're not, don't have the prompt, you know, just right? Maybe we'll find an amazing prompt away from cracking this, not hearing the judge. - Is there, you need an AI jury too, I don't know. - Yeah, yeah. That's it. - AI execution, or you know, sometimes that model. - Yeah, if a Gellante's come about-- - Gellante's got a whole great story. - Justice system, yeah. - That's bounty hunters, on and on and on. - You have to agree, you give it the context of literally the entire justice system. - Yeah, that's, that's, that's it. You know, okay, we'll pull out that one. But do we not have the infrastructure ready? So like, there's things we've tried where it's like, okay, we actually need to go build up some expertise for infrastructure and being able to do a different type of retrieval to have this work. So like, as an example, our, there's a feature we have in Magic that lets you do generate basically chains of cells. And we were, the first version of it that we've sort of beta tested, let, just told the model, just generate any number of cells you think necessary to complete this task. And it would go and do that, but you would have this drift down the chain of cells. So, so just conceptually, you can imagine like asking that question and having a general, like a SQL cell that retrieves data and then having to do Python cell that-- It operates on that data and then like a SQL cell that reshapes that data using SQL and then like a chart cell that visualizes that. You'd see it make some really weird decisions and we were getting down this like path of like prompt engineering it being like, oh no, like don't make a bad decision, like think harder about it. And like, it would get better at certain things and worse at others. And eventually we fell back to like, okay, actually the pattern, the predominant thing is like actually one of a few templates. And we're actually going to like sort of like hard code those templates and have the AI select amongst those templates. Now, when you look at that, when I think about that, I go, you know, the template thing feels brittle, it feels limited, and obviously to me, the long term should be unconstrained change generation. Like that is, I think like a almost obvious thing that should work. The question we grapple with is like, you know, maybe cloud 3.5 will be good at it. Maybe GPT, I will be good at it. Maybe we just weren't doing enough. And even since we worked on that, there's a bunch of new techniques people talk about, like self-reflection. We could pull in our AI as a judge strategy, and we think that there's some ways that could work better. And so it's really hard to know. And again, like I think we've been, my team and I have been building software, data software for a long time. We can stare at most features and be like, few sprints, you know, a quarter, whatever, you might get me asked pretty quick. I've got like, you know, just enough data points. I can be an RFC and if an engineer is going to be, is like, this is going to take six months. So they're like, no, it's not, no, it's not. - Yeah. - Because you say AI features, there's things that we've looked at and they're like, that's going to be hard and it basically works overnight. And there's things that we've looked at and they're like, that should totally work. And it's like, this might be theoretically impossible. Like, right, it's difficult to know. Another example is diff-based editing. So like when you ask to edit in hex today, like if you have a SQL query and you ask to edit it, we will go out and generate, or basically pass the current query, or the code block or whatever, with your edit instruction and some other context to the model and say, hey, edit this code block to reflect this. Great, we'll get a new block back. But we stream back effectively the whole new generation and we show it to you side by side. Now that feels slow 'cause if you've got like a hundred wine Python code or query, you know, we're streaming the whole thing back. So it's like, well, can we ask it just for the deaths? That, all at once is something that I think anyone who's spending the time with all of us can like very easily imagine like, okay, yeah, we'd send just the lines you wanna edit it and actually getting it to respond with the right lines and having a certain right place is really a part. And it's like, okay, but we almost gave up on that. We actually did for a bit and we came back to it with a new technique and then it worked. And so it's really hard to know what is actually gonna crack these things open and the space is moving so quick. And it's not like the models even just get monotonically better at things. 'Cause I mentioned with GPT40, it actually got worse at some things. And it's like, where are the lines where you're betting that you're just gonna get this natural tail when it rising tide from the models getting better? And you can just kind of skate to where the puck is going and it'll catch up versus like the other hard work you have to do. That is, I think, what makes building products in the space that we're really tough today. I don't think people are talking about it, not fine. Yeah, maybe sometimes I wonder is everyone else having a much easier time with this than me on it? - The answer is no. - The answer is no. - John and I. - Yeah, back to the template thing. I think there's an interesting concept here. You're mentioning like, oh, ideally, like it can just produce all these cells and it will just work. But I think there's kind of an interesting interface thing here where it's like, okay, is there something trivial I can ask of the human that drastically helps like on the AI? So maybe the human is like setting context for the cell is like SQL or Python. Like that's super trivial. But then with that context, the AI is like, oh, like I'm doing Python. Yeah, like are there other examples of that? - Yeah. - If you use magic today and you're like, generate me a Python cell to do something versus generate me a SQL cell to do something that will follow your instruction and it will dutifully do that. 'Cause that's passed in as part of the context. - Yes. - Prompt. It's actually, and this gets, you know, where it's not just a single shot GPT wrapper, like we have several different agent responses happening as part of that when you type that prompt off what's happening under the hood. Like there's a step where we're selecting the template to use. And so that context you provided will help with that. - Is that deterministic, like selecting the template? - It's non-deterministic in that as AI response. Like we have the temperature turned on, but it-- - Okay. - Got it, yeah. You're, we pass it a list of templates. We pass it your prompt and we've effectively say-- - All right. - Our engineers can be like, it's a lot more complicated than that very, but we effectively say like-- - Right, yeah. - Yeah. - For the listeners, like you adjust the temperature of like the way I understand it, like more creative, more varied responses, like that's higher or more exact responses. Yeah. - Yeah, that's right. - So it's set to more exact responses 'cause it's like one of a few templates. - Right, and because we have that temperature turned on, it's one of the things that when we do he valves, it's, you know, if you think about it just from first principles, like if you're gonna do he valves, you actually wanna make sure you're getting the same response effectively at high. So you, yes, I wouldn't call it deterministic, but it should be stable. - Yes. - Yeah. - Okay, a certain model version is the way to look at it. But yeah, like, and then it's like, okay, well, like what if the model's uncertain? What if you haven't given it enough context? Can it respond asking for clarification? Well, it's like, that's a very interesting question. How do you-- - Yeah. - Are models good at expressing uncertainty? - Some aren't. There's other techniques you use to do that. You can have a judge weigh in. You can-- - Yeah. - Yeah, there's like, there's different ways to do this and these are really unsolved. And so, you know, we look and think about this all the time. And the nice part for us coming out this, again, I think just kind of right place, right time, is like, we're a product that is really our core focus, and will always be our core focus, are being incredible for data practitioners. We wanna be, you know, it was awesome tool for people who are asking and answering questions, trying to do deep dive, you know, get to insights, things that aren't just turning out dashboards. And in many ways, you know, we can incorporate AI into that product, into our product, and it's okay for us to be a little wrong, because we're targeting people who are, otherwise, be in hats, writing and sequel and Python, right? So like-- - Yep. - It's the same as like a co-pilot, right? Like, it is not perfect. We use it, a lot of people use it, or people use other cursor. There's a lot of other AI coding tools now. Those are not perfect. They can be a little wrong, but the assumption is the user knows well enough to-- - Yeah. Or even writing tools, right? Like you use a, you know, ask AI to help you write a blog post. - Yeah. - It's actually perfect, but enough will good luck. - Yeah, exactly, right. And so this has been really good for us, because we can iterate on these things. Like, we can be a little wrong, and still be really good for users. - Yep. - Yep. - And get better and better at this stuff. And I think the question is like, how tidy are you making those feedback loops? How good are you judging whether you're right or wrong? How are you in the trade craft we're building? Like, everything we're building, you know what? We run our own vector database. We didn't write the vector database ourselves, but we self-host the vector database. Yeah, and setting up the pipelines for it, and doing retrieval over it, and scaling that, and figuring out deployment to our single tent. - Sure. - It's like, that's really hard, but that's something we're good at now. And we can just compound as the tools get better, as the techniques get better, as the papers are published. Yeah, it's been really fun to see that sort of those learning stack on each other. - Yeah. For listeners who maybe were tuning out of the drive-through, just a reminder, we are on with Barry McCartill from HEX. We're talking all about how hard it is to build AI stuff. So I want to change gears, and because you had a question about AI and BI, and I think that's a great opportunity to zoom out, and just look at the landscape. Barry, you were just at Snowflake and Databricks. But before we leave the AI topic, one very practical question, and this is mainly for me and John. It's selfish, right? He started his selfish company. - Passing for a friend, yeah. - Yeah, it has to be for a friend. No, but also for the listeners. Are there any heuristics that you have developed through that process? And I'll just give you one example. You know, have you found that if you have to keep hammering on the prompt and getting really wild with that, is that, you know, okay, when that happens, like we tap the brakes because we need to step back and evaluate, or other things, right? Where you've sort of learned how to increase the cycle of experimentation by knowing when to stop or change your strategy. - I don't know that I have a simple answer to that. I think we've got a pretty amazing AI engineering team that I think have developed a lot of like heuristics and trade craft around that. I do think there's a point we've seen certainly in real life, a point of diminishing returns on like prompt hacking, like, yep. - But one thing we've done that I think is important is really focused on setting infrastructure for experimentation and evaluation up. Like I think that running headlong into, you know, you can do all prompt engineering in the world in like a playground where the rubber meets the road is like, how does that work over a larger set of, yep, data points? And I think that's the fundamental problem of like, you know, getting something working for a demo dataset on your machine is one thing, getting it working in the real world for real user albums, whatever the space, data, customer support, I think, is much different. And your ability to build quickly, to iterate quickly, to understand and shorten your cycle time of experimentation is really predicated on the infrastructure you build up. So that's where I was talking earlier, like we've developed this thing we call spell grounds, we built our own emails, like that helps us learn much faster and see like, are we actually descending a gradient on improving by incremental prompt improvements? Like we can edit a prompt and run it over a subset of our emails and just get a, yep, let's be back. Like, okay, is this actually working for real? And get that feedback. And there's times where it's improving things and there's times where it's not. And so I don't think there's a clean heuristic other than to say that like, the only way to really tell is to happen infrastructure for experimentation can get shaped. - Love it. All right. AI and AI, John. - Yeah. Yeah. So I think coming off of conference season here, you know, we've got a lot of, we've got a lot of players that are consolidating. I think you mentioned before the show how we have these nice lanes, like say two years ago, like, all right, we're in the visualization lane, we're in the storage lane, et cetera. So maybe talk a little bit about the consolidation you're seeing and, you know, any interesting observations or predictions on does that continue? What does that look like? - Yeah. So, I mean, it's such an interesting time and we just spent a bunch of time talking about AI. But even if you set that aside, I do think we're coming through a cycle in the data stack. Here on the data stack show, right? Like, it's right. - It's right, yeah. - You guys have actually had a really interesting longitudinal look at the evolution of it, right? 'Cause you've had a lot of guests on, you've had people on over time pay attention to the space, obviously. You know, what happens when a lot of capital flows into an ecosystem, you get a bunch of flowers that bloom and I think that's a beautiful thing. I mean, I think people can look at it cynically, but I know a lot of really smart people that built some really cool stuff in the data space for the last few years. And some of those things will continue on, some won't, but it's very, very interesting to see. I do think we're coming into an era of consolidation and it's not just like interest rates are going up or whatever. I think it's just like, we've learned a lot. Like, people have tried different patterns. There's things that work, there's things that didn't, and I do think that there's a couple of dimensions of this. There's sort of like a horizontal dimension, which is at each layer of the stack who's emerging as winners and losers and what are the actual subcategories that that layer? Like, we kind of sit at the front end, the collaboration layer, you know, what are the actual divisions there? And, you know, if you look at like the metadata, governance, cataloging layer, what are the divisions there? Is there a standalone orchestration and transformation layer, you know, who wins there and how does that look? And then, at the bottom, you know, the company's running these big conferences and running the infrastructure. You know, you have the cloud scalars, so Microsoft, Amazon, Google, and then Snowflake Databricks are the big players, and then you have a long tail of other independent, you know, data, you have like the Starburst of the world, the dream years of the world, and it's just this question of like, how does this actually consolidate? Are some of these sectors, categories winner take all? Are they, can you have multiple winners and what you need to do? And I think it was very interesting being at the conference and seeing, just walking the floor, talking to people, companies that two years ago, let's say at Summit, like, you know, Snowflake Summit 2022 were advertising like partnerships or talking about how well they work together or they were next to each other, you know, who's next to each other and their co-hosting parties together are now like, oh, actually, we're gonna compete more than we have thought because you're kind of being pushed to do that. And this is happening at every layer of the stack. I think there's a really interesting question around like, David catalogs, governance, metadata, like, does that all become one thing? Like, DBT has their explorer view, you know, how far do they go with that? Just where to stand a lot of data catalogs live, where to stand a lot of data observability platforms live, like, and I don't think anyone really knows the answer other than that, there will likely be just by count, like, less distinct, you know, less players, but also perhaps like the actual categories will be more of an amalgamation than like these really fine subdivisions that we had in sort of the Cambrian explosion of the modern data stack. - It's always interesting to me to think about like Salesforce, for example, like really dominant and CRM and then, you know, HubSpot's got some of that market too. It's interesting for me to think like in this space, is there gonna be just somebody that's so dominant, like a Salesforce, and then like maybe like a number two and then like a long tail? Or do you think it'll be a little bit more like, 'cause historically like Oracle was really strong, so equal server was really strong, you know? Open my, you know, MySQL Postgres, like do you think it's more like that or more like a Salesforce winner take all, you know, smaller? - Well, let's ask a question of why is Salesforce so sticky and durable? Like, I don't know if you use the Salesforce UI. I do sometimes, I try to avoid it. - Try not to. - I know, no shade to my friends at Salesforce, but like I think they would probably also say that the UX design of like the Salesforce CRM app is probably not the like thing that everyone's-- - Very well, it's like a class, moving from classic to lightning was like a giant mutiny. Their own users were like, we're not, we are not changing that. - And like a decade long project, right? - Yeah. - And so anyway, I mean, what why is it so sticky? I had the chance to ask a very form senior Salesforce except this recently. And they were telling me, 'cause I was curious about this and they were telling me it was like, it's the data gravity. It's the fact you can have a much, there's a lot of startups that have a nicer UI, but it's the fact that the data lives there and there's all these integrations with it and the industry standard all the way from other tools to systems integrators and consultants, both standardized around that thing. If you kind of look at the data stack, like maybe the closest thing to that in terms of like just sort of singular dominance has been snowflake in the warehousing world over the last few years. And it's a question of like how durable and sticky is that data gravity and governance. It's like, what is there? You have other stuff there, but like, this is why this is an interesting conversation on iceberg and open formats is like, I think a lot of buyers see that, a lot of buyers have experienced being on the sharp end of this with Salesforce. - With Salesforce, yeah. - Absolutely. - Hey, hang on, I want more modularity and a pattern we're seeing a lot of customers do is actually stand up like a data lake alongside and I asked a customer at Snowflake some of this, like, oh, like, interesting, why is that the pattern? Cause with Hex, they're excited about, you know, we can have one project in a concrete from Salesforce, or excuse me, from Snowflake and our data lake, they were, I think they were using Trina for it. - Yeah. - And they were like, well, you know, it helps us get some of these costly workloads out of Snowflake, but it also, like, we tell Snowflake that we're doing it and then that helps our negotiating. - Oh, yeah, sure. - Yeah. - It was less about the actual, like, net, like, we moved this workload from here to here and it's this much cheaper and more like, by proving that we can move a workload, we've like established some leverage and our negotiation, or Snowflake rep. And so this is an interesting question, right? So you can see that the vendor, the vendors of that layer, like, the last thing they want to be is commodity query engine on open source format, data on commodity blob storage. Like, that's a bad situation for them. And so then you start building a lot of other stuff to try to lock people in. And that sounds like a dirty word lock-in. I think the more generous way to say it, you know, you're trying to provide more integrated value for customers. - Yeah. - But customers see that coming a mile away. And so I did a question, like, what the market wants and how much that power of, like, integration will pull and so question for everyone in the data side right now. - So my theory on it, then, is you're right, is there are enough people and it's not just Salesforce. So you've got Salesforce, but then you also have like an SAP or an Oracle and the, you know, ERP side of things. So my theory is there's enough people out there in enterprise that want to avoid the, you know, huge lock-in and the implementation fees and the reimplementation fees and the upgrade fees and, you know, contract, all of the money they have to spend around these systems to where I don't know if that model works again right now. Right? As far as where a snowflake or somebody could be as big as a Salesforce? - We'll see, yeah. - I don't know, look at Microsoft, right? Like, so Microsoft, you know, famously was on the sharp end of a antitrust loss. - Yeah. - Right. - You know, 20 years ago, more than 20 years ago than that. But they're good at the same things. If you look at it now. - Yeah, they've got, you can go as an enterprise buyer as a CIO and spend one contract with Microsoft, one overall agreement and have leverage in your pricing because it's bundled on everything from a Windows laptop to your word license, to getting teams thrown in for free to increasingly they're trying to leverage that over now into AI so you're also gonna have that same thing, not just your Azure compute, but you know, you get a GPT-4 endpoint on your own BPC that you get nice economics on. And then while we're at it, let's throw in Fabric. - Yep. - Yeah, Fabric, cool. It's this one thing. It's, you know, it's better than Snowflake on this. It's better than Tableau with this. It's, you know, whatever. And you can bundle all that together. That is really attractive to an enterprise buyer. It's also really scary to enterprise buyers. You were all in with one vendor. I think it's a very real question of like how that tension plays out and that's not new. Like, I think it was like, you know, I'm not, I don't consider myself super young anymore, but it is funny, I was cutting with my uncle who was like a really successful software executive, you know, in the sort of last generation. And we were talking about this and he was explaining like how Microsoft like had snuffed out markets in the 90s that I had never even heard of. Like these were companies I had never heard of. To him, it was like, it'd be equivalent of them snuffing out. Yeah, I don't know, like a, you know, Snowflake or something. - Yeah. - Is this something that's very common today? You can, I'd never even heard of it. I wasn't even aware of it. This pattern has been going for decades and it's going to continue and there's this bundling on mumbling and it's a very interesting time. - Yeah. - One thing that is interesting though, and so this is a question for both of you. In terms of the platforms, I think, you know, there's sort of a classic like Snowflake Databricks, you know, Battle Royale and that's sort of been, you know, sort of, sort of played out in a number of ways that a lot of companies, a lot of large enterprises like run multiple different, you know, they run like several of the big vendors. - Right. - Per division or per business unit? - Right, exactly. But at the same time, you know, I think back on that a couple of years ago and that probably was more real where it's like, okay, we do, we run these workloads on this vendor and these workloads on this other vendor. But every vendor is building on the infrastructure to run like all of those workloads. And so I think it's more possible now from a baseline technological standpoint for there to be like a winner take all than it was previously, right? Because they, but that's more of a question. Like, do you think that that's true? Or do you think that the different flavors of the cloud platforms mean that like large enterprises will probably always run, you know, multiple of the major vendors? - Well, if history is any kind, large enterprises will have a lot of things for them. (laughing) - I ask like a customer, like, what do you guys use for data warehousing? And they're like everything. You know, it's like a fortune file. - Literally everything, yeah. - And not just one of everything, like multiple of them. - Yeah, yeah. - And why is that? Well, you know, it's maybe the company's the result of mergers or maybe different divisions that have chosen different things. - And increasingly, I think it's very strategic, which is like, we want to run different vendors because we want, we don't want to be locked in. And multi-cloud is a very real strategy and there's a lot of enterprises that are very bought into that path. In fact, I've not talked to many enterprises, like CIOs, CDOs who are like, yeah, our strategies were all in on one thing. So I don't know that all, it's how it'll face out. And I think you can look at even the current markets that are like reasonably mature. Like I would argue that data warehousing is like a reasonably mature market. It is very interesting to observe that the two players that wind up getting the most airtime, I mean, let's, if you set BigQuery aside for a moment. - Yeah, right. - Like in Databricks, neither of them run their own metal. - Yeah, right. - That's both running on AWS or Azure or GCP. - Yeah. - And it's actually just like observing that for a moment. Like, Snowflake and AWS have an incredible partnership. They got a market together, I was at Summit in this like, the exact track event with, you know, like the senior AWS guy and the senior Snowflake guys and they're all there. But meanwhile, there's teams in each company that are competing on deals. Like they, AWS makes Redshift. They make a theme like, and I think, so this is not even a new pattern in the data stack. And I think, you know, when I talk to folks at Snowflake as an example, like they're aware that they are increasingly building things because they're worried about being commodity query engine that, you know, we'll compete with partners. But I think what's interesting talking to them is like, yeah, we're on both ends of this ourselves. I mean, famously Databricks and Microsoft had a really great partnership over the last few years. Like Databricks on Azure was like a really big deal. And many of that in the Databricks what it is. Yeah, Fabric is like ripping off a lot of Databricks. Yeah, yeah, sure. And so I don't know, like, I don't think that, I think these people don't see it as winner take all. You don't think so either. And I don't think the data world has ever really been that way. But we'll see. In my opinion in the past, like maybe 20, 30 years ago, compared to like, let's say 15 years ago, you did have a big market emerge in the open source, like Postgres and MySQL, right? Because prior to that, it was mainly closed source databases. It's Oracle, yeah, of course. Yeah, Oracle and SQL server IBM, DB2, for example. Yeah, so that was all closed source. And then you had a huge like surgeons of like, you know, with like Facebook, for example, right? Like they want open source. So you have all these companies that prove like, oh, we're going to open source operating system, the open source databases. So that's a major change. Now it's like, like, I don't, it's hard to see people swing all the way back. Like, it seems like there's got to be some kind of middle ground where people aren't going to go all the way back. Like, yeah, we'll just be closed source like data. Like we're just going to go all in snowflake. We're not going to think about iceberg. So I think because I don't want to go all the way back, it's less like I have like a winner. Yeah. Yeah. Well, I mean, open source is a whole really interesting topic of like, how and where do you build successful open source businesses, you know, I've got a thesis personally and informed by a lot of conversations with smart people. This isn't entirely my authorship. But like you can build a successful open source business can be successful when it's a pain in the ass to scale the technology yourself, right? Like it's sparked. Like, yeah, a little bit of a challenge. We use an enormous amount of spark. We scaled itself as a nightmare. Yeah. And that's why Databricks exists. Scaling Kafka, really hard, so confident exists. Yeah. Any type of database typically is hard. It's why the database vendors, you know, elastic and all exist. I think when you look at certain open source technology and there is some in the data space that is not hard to scale yourself, right? Well, OK, you got a why are we going to make money on this? And it's hard to make money then on the actual open source tech itself. You have to make money on adjacencies. And without naming names, you know, I think that you see some vendors in the data space doing this. And I think you look at like Iceberg as an example now post acquisition. By the way, side note, the announce about them being acquired by Databricks went out, you know, during Snowflake Summit keynote. And our view is at a kitty corner to theirs. The vibe at the ice, the tabular booth-- sorry, it was tabular that was acquired now. Yeah, yeah, yeah, yeah. But the vibe at the tabular booth was just so funny to watch. And the employees were like, what do we do? Are we supposed to talk to them? Do we patch it? We pack up? We're kind of like in enemy territory now. Are we like Databricks like partisans now? Like, you know, a very funny situation. Yeah. Just those are really hilarious. That's hilarious. Was Ryan there? I didn't see him every time last year. But it was funny to see him like customers are going by their booth and like congratulating them. But the team doesn't know what to say. Yeah, yeah, yeah, yeah. That's classic. But like, you know, it was like for them, like they were not-- I don't think it's a secret. Like, they were not printing money themselves as a standalone. It was like, could you-- was an iceberg by itself technology you could make money on? Or is the money made on the like query engine around it? So, you know, I think you've got good points on the open source thing. I think it has to flow from like where do you actually-- where does value accrue in open source? Right. And what does it mean for like people's willingness to self-host or self-run things or that modularity? And I think the story with iceberg and the reason Databricks bought them is like, you want control over that open format. I mean, it is, there's like a great telling that they want control over that. Yeah, sure. Because they've done such a good job of this with Spark, right? Anyone can ostensibly host Spark. The networking part of Spark historically was an absolute nightmare. That was the thing that made it really hard. And what Databricks did was very clever. They wrapped Spark up in an open source governance model where they-- it was open, but they-- I'm using air quotes, people can see-- but they controlled it. Databricks is in full control of the Spark roadmap. And the networking part, they made modular. And then Databricks, like hosted Databricks Spark as like a superior networking module. Right. And so you basically have an open source thing that's way harder to use. They've made their version of it much more scalable. And I think you can see this hat coming from a mile away with this sort of like table format stuff. But like, clearly whoever feels like they are able to control that technology is going to do a bunch of stuff to make it no less open on paper, but less possible for people to run and scale and utilize it for themselves. And I think that's the thing to keep an eye on. Yep. Well, I literally can't believe we've been recording for over the time limit because we started early. So this is what we get to do when Brooks is out of time. I know, it's best. But OK, Barry, I have a burning question that's unrelated to anything data that is related to hex. So you've put out some amazing marketing material, one launch video in particular. And so how did, like, who wrote the script for that? Like, how did-- was there a team? Were you involved in that? How did they come about? I just need a little bit of backstory on that because-- and we were talking before the show. Like, in the office, we have one physical office at Rutter Sack, and we all gathered around the computer and we watched it, like, three or four times. And so can we get a little bit of backstory on it? I'm very flat-eyed. Yeah, I think you're referring to our spring launch video. We should put out a few of what people can find on our site. Yeah, so we had a lot of fun with it. And yeah, the backstory is, like, when we do these things, I think there's a very standard, sort of, like, Apple, keynote, homage style of product launch video that is very easy to sort of default to. And with everything we do, we just try to have some fun with it. People can't see, but on the video here, you have, like, a boxed version of Rutter Stack. Yes, the-- Like, in the straight, like, a 1990s software box. Yeah, yeah, it's straight, straight, straight, or coalesce or coalesce booth last year. So we just kind of approach these things that we're like, we just have more fun with this and not take ourselves so seriously while still being very serious about the software we're making and the value we want to provide. So yeah, the video was really fun. I was involved very closely. I really enjoyed that stuff. I have a bit of a background in it. I had a brief dalliance with thinking I wanted to be in, like, some version of film production earlier in my life. Ah, cool. Nice. You got that. But we've got a great team internally that we just have a lot of fun. There's kind of a core brain trust of a few of us that jam on these ideas all the time and throw out pretty unhinged stuff in Slack almost every day. And it gets refined down to a few things we want to do. Actually, the video, the skit we did, the sort of office-style skit, was like, we were struggling with what to name this. We were like, is it 4.0? We literally had that internal, like, what do we call this release? The beginning of the launch video has this kind of dramatized version of us struggling to come up with a better name than spring release, which is very boring. Yeah, yeah, yeah. It's kind of leaving into that. We've got some more fun stuff coming. Yeah, that's great. That's great. Yeah, I mean-- You can have more screenings in the next few months. Yes. No, I'm so excited. No, it was incredible work. Barry, thanks so much. We need to get you on the show more often. Let's try not to weigh a year and a half until the next time. Whatever you want, let me know. You know how to reach me. I'll see you guys. Thanks for having me on. The DataStack Show is brought to you by RutterStack, the warehouse native customer data platform. RutterStack has purpose built to help data teams turn customer data into competitive advantage. Learn more at RutterStack.com. [MUSIC PLAYING] (upbeat music) (upbeat music)