During the conversation, Dave shared his background, including his econometrics work at UPS during the 2008 recession and his tenure at Google Cloud, where he focused on BigQuery and customer-facing architecture in the gaming industry. The discussion covers the landscape of data warehouse products like Snowflake and Databricks, the complexities of cloud platforms, and the challenges of observability. They also delve into the cautious integration of AI in observability, emphasizing the need for better mental models and practical approaches, and so much more.

03 Jul 2024
(upbeat music) - Hi, I'm Eric Dots. - And I'm John Wessel. - Welcome to "The Data Stack Show." - "The Data Stack Show" is a podcast where we talk about the technical, business, and human challenges involved in data work. - Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. (upbeat music) - Welcome back to the show. We're here with Dave Nguyen. Dave, welcome to "The Data Stack Show." We're excited to chat with you. - Absolutely, glad to be here. - All right, well, we know you work for edge Delta and observability, but give us the brief overview of where you came from before that. - Oh man, well, if we want to go back far enough, there was a cold and snowy night in February of 1984 and a cry rang out at four in the morning, which is unusual for that time of year. But if we fast forward a little bit from there, so I've been a lifelong geek and I've bounced around a number of different places from doing econometrics at UPS headquarters in Atlanta to hopping around a few startups in Silicon Valley with some ETL software and some observability software. And then I was at Google Cloud for a number of years, doing all things there, both on the compute and on the data side. But what, until I finally am here at a new startup where we're doing observability. So a little bit of this, a little bit of that. - Very cool. - Nice, so one of the topics we talked about before the show was your time at Google, and we talked a little bit about BigQuery. So I'm interested in digging in a little bit more there 'cause you were at Google, I think, during some of the crucial years where that was, you know, that came to be a really pinnacle product. So what are some topics you wanna dive in on? - Oh man, you guys tell me what you wanna talk about, but BigQuery is definitely, in my opinion, one of the best, if not the best, data warehouse products that exists in the clouds, because it's the closest you can get to a SQL API with really not having to worry about any of the backend. - Right. - No knocks on Athena, Athena is great for what it does. Don't get me wrong, but if I'm gonna open another tab and start managing all of the S3 stuff and have my all my Parquet files in just the right format, I'm not having the best day if that's what's happening. So BQ just makes it super easy to dump data in there at the appropriate time, in the right increment, in the right spot, and you can just go about and start querying it whether you're talking to Ed, the gig scale or the petabyte scale, it just doesn't matter, so I found that really slick. - Awesome, well, tons to talk about. Let's dive in. - Yeah, let's do it. - Dave, there are so many topics that we wanna dig into. I'm actually interested about your, so you studied economics and then right out of school, you did econometrics work at UPS. What did you do there and what does an economist hired by UPS figure out for them? - Ah, so it's a great question. I joined, actually I was hired to be part of one group and was transferred to a different group before my first day. - Yeah, that's nice. - But the way this worked is that I joined the forecasting team, and this is 2008 where we were coming into a giant recession, and the thing that this team was responsible for was predicting as far out into the future as possible, how do we know how much we need to ship, where, and when? - Great, so we need to maintain all these time series that gives us forecast around all these things. In the past, the models didn't actually have to be that sophisticated because UPS more or less tracked GDP, period, within a percent or two. So not that hard to predict. You could just sort of hit the button. Until 2008, wait a minute, suddenly some things aren't quite working, right? - So, the number looks slightly different. So, there was a guy who thought that maybe we could bring in some more econometrics into the forecast to figure that out. They hired a PhD in order to implement some of the key ideas, and they hired a grunt to do all of the work and excel to make scale happen, and I'm not gonna say which role I served, and I don't have a PhD. Let's go with that. (laughing) - Wild, okay, so dig into that a little bit more. What did you, I mean, what did you find? How did you address the problem of, you know, sort of your core metric that had been used to forecast the business changing? - Oh my gosh, well, so we're almost thinking about it more on the operational side of what I was responsible for. So, we had a number of time series that we published as a team to the wider organization. This is before we were using a database. Like, someone passed around a CD of SQL, you know, SQL Server 2005 as like, whoa, maybe we should try this. - Yes, we were before that, and in that sense, we were sharing Excel files about what were the different series that we maintained. The PhD who is still there and is doing tremendous, tremendous work as I understand it. Her name is Iwanah Mizzare, and she's great. She would literally read academic papers and books and things that were published, and translate the different processes that were available into stuff for time, for a time series that would matter. And so, that would work once. And then, I had to figure out how to do it more for all of them. And this involved a lot of VBA that I was very grateful for IntelliSense, no matter what level of intelligence you would consider 2008 IntelliSense to have. Just to help me keep moving and get going. But a lot of these operations were very annoying, right? So, you can't just fill down across the whole row because you've got month resets, right? And you've got different moving averages, and you've got different, all different, like, little operations that, if I was doing it today, I probably would have functionalized everything and got out of that row. But I wasn't quite that smart. Younger Dave that had fewer, I'm gonna call these blonde hairs on my chin. Younger Dave that had fewer blonde hairs didn't quite know that. So, we were trying to do everything half in Gooeyland and half in VBA, and that led to a lot of files with a lot of very particular changes. And, you know, it was a job. - Yeah, yeah, okay, just out of curiosity, 'cause I wanna jump actually to the, with there's so much to talk about with Google, and I know John has a ton of questions, but time series data at that scale is very interesting, right? And so, the tool side that you're talking about sounds, you know, particularly painful. What stack would you use today, right? - I mean, there are like time series databases, like in flux and other tools like that, that are really good, you know, maybe that's not the right tool for what you were doing, but what stack would you put together today to do that? - And before you answer that, what percentage of the time did you spend, like, waiting for Excel to become responsive again after you made a critical day, and you didn't hit save? - So, I'll start with the latter question first. None at all, because I was managing 250 Excel books. Why would I put these all in one workbook when I could do this copy and paste 250 times? - Got it, okay. - So, performance problems solved. - Yes, that is the original brute force. - Wow, okay, yeah, it wasn't expecting that. - To answer your former question, I've become a, and this was actually true with a different issue that we had when I was there, where we had a Microsoft access application that wasn't idempotent, and it needed to be run on a scheduled basis, and there was a problem with it that we didn't notice because the non-idempotency had messed it up, and it wasn't until a different consumer that was rather important, noticed that our forecasts were the same week to week for a few weeks now, and we were like, uh-oh, so redoing that now. I mean, I've become much more on the idempotent train, much more on the functional train, where I would be trying to bake as much of those as we can, storage is cheap, which was even true enough then, but it's much more true now, and so why would we bother trying to mutate state in place when we could just have a much clearer lineage about how these things get transformed from place to place? So, it's all of the really cool and interesting stuff we talk about, like, orgionizing our project correctly and making sure that your tables are well named and all that other good stuff, where you don't just hope that someone else can read VBA and has your brain. - Yeah, I often wonder if weather forecasts never work that way, where, like, somebody somewhere, it's like, oh, like, you know, the weather forecast is the same as it was last week, and somebody didn't run something, you know? - I absolutely would believe that. - At a local level, that's gotta be possible. - For sure. - I often wonder if the precision is cut off because there's, like, moss on the thermometer, and it's just like, sorry, we could only get to one decimal point, hopefully that's good enough. - Yeah, it's like literally in rotation probably. - Yeah, it's probably why they do it at airports because, you know, they have to-- - Yeah. - Does instruments clean? - Yeah, good point, yeah. I mean, you gotta think about the meteorologist, you know, who, it's raining outside, and the forecast comes in, and it's different than the actual-- - Yeah, absolutely, yeah. - Absolutely, I mean, have you guys read that article? Gosh, it was on Hacker News the other week about, like, crazy real life bugs, and the bug was, the Wi-Fi works when it's raining and not when it's not. - No! - It's fantastic. - So it's worth looking up to hear the whole story, but basically there was a guy who came back and from college, and he was always tech support first thing, but his dad was also very capable, and he was like, yeah, I don't know why, but the Wi-Fi only works when it rains, I haven't looked into it yet. And-- - Right, which depending on where you live is like, where you're not knocking on the internet that much. - For sure, well, and also that's backwards than what you would expect in the whole host of things. - Yeah, definitely. - So the long story short is that they install, they got their internet from a microwave beam that they were getting from an opposing house, and what had happened 20 years ago was that someone had planted a tree, and so when it rained, it waved the leaves down enough to be clear, and when it stopped raining, it was just enough to block most of the signal. - That's awesome, I love that. So when it snows, it's just like, perfect internet. - Oh yeah, that's what it was imagine, exactly. - Yeah, then as I recall, the article wasn't there for winter time, so I can't speak to it. But what imagines? - Man, that is so great. Okay, well, just a reminder to the listeners, we're here with David Nguyen from Edge Delta, and we're chatting about Wi-Fi signals, BBA and forecasting and breaking up Excel workbooks, but John, you had a bunch of Google questions, I have some too, but David, tell us, just give us a brief overview of your time at Google, 'cause you worked on all sorts of stuff, but how long were you there, and what were the sort of the biggies? - Yeah, I was there for about seven years. We started, I started when there were enough customer engineers to fit in one training room across the entire world, which we did once, and we didn't have enough training material to last the entirety of two days, so half of a day was scheduled for five minute lightning talks from every person in the room, which was fascinating 'cause it could be on any topic that you weren't in, so that was the vibe. It was young and it was fun. We also didn't have all of the enterprise things that people routinely demand when I joined, like being able to directly peer to Google. That was a pretty big one, that didn't exist at that time. This was also before Kubernetes and before GKE and before various other things. So I was out there talking to people directly about architecture, how they could migrate to the cloud, how they could re-architect so that things might be more effective across the entire suite of products. I like to joke that the job was not that hard. All you had to do was know the some 300 products that we had, know the some 400 products that AWS had, know some 500 open source offerings, and how they all fit together in every conceivable scenario. It's not that big of a deal, but that led to an interest in basically all different flavors of stuff 'cause at some points I was territorial, where I would cover the entirety of the West Coast 'cause that's how territories go when you're early, to smaller and smaller territories. And then I started focusing on an industry because I have tried to quit video games several times. And I'm sure I'll succeed one of these days, but I figured maybe I should make that more of a job thing because we had several notable gaming customers at GCP. Niantic, makers of Pokemon Go, was probably the biggest one earlier on, but there have been others like Unity and Apex Legends and various other things have also used different degrees of Google Cloud, which I may or may not have had a hand in. And so yeah, like that's where I was for most of that time doing the customer facing architecture side and then also doing a little bit of partner stuff as well. - Nice, so Google Cloud, I mean, that is probably the most like fundamental, like if I picked like seven years, you know, to be a Google Cloud, it seems like those were some of the most transformational years. - Totally. - Like what, so we want to talk about more about BigQuery. - I'm gonna argue today is pretty intense. - We'll see, we'll see. - But yes, oh, we could probably have a chat about that for sure, but yes, it was definitely big times. - Yeah, for sure. So we want to talk about BigQuery, but were there any other like, you know, in your years there where like a product comes out, you guys are introducing a new product when you're like, like, wow, this is going to be incredible. Or maybe just we could talk BigQuery or if there's some other product that you felt that way about inside the ecosystem. - I don't think so. I think BQ is really the one that I'm the most enamored with just because it delivers so well on the core promise that solves so many key problems. Dataflow, which is Google's implementation of Apache Beam, which is the ETL framework, has promise, but it's too complicated. And I tried to understand it for a couple of halves there. I had it on my OKRs to try and figure this thing out. And usually when I had trouble, I would go ask the team and it'd be like, go read the source code. And I'm like, the last Java that I saw was at jail man high school at a computer science AP and I cannot read. That was in just like Java's five or something. And they didn't have decorators, and I don't know what any of this syntax means anymore. So I know thank you. But I also knew enough about it to advise people on what the architecture patterns they needed and common pitfalls they were. But even the Python SDK that they built, I think was just a little bit beyond what's pretty reasonable for people to get. So I think that BQ hits the right thing. I think VMs are very commoditized. I think GKE is great and is definitely probably the best Kubernetes platform, but I mean, that's borderline commoditized as well because everybody's doing Kubernetes. - Well, but I think you bring up a good point that I think a lot of companies struggle with is like they can have a brilliant solution to a problem that is not accessible enough to enough people to make a difference. - For sure. - And it seems like you're saying that the BigQuery kind of hit that like brilliant solution to a problem and very accessible to a large number of people. Yeah. - The only challenge that you would really have is migrating the data and getting it in there. That was really the only one because if you have a petabyte capable system, your next problem is getting a petabyte of data in there. - Sure. - In order to make use of it. - Yeah. - So, sure. - What I'm interested, we actually, interestingly enough, we have not talked about BigQuery much on the show, which I love that in recent shows you covered a bunch of topics. We got into the other day we got into details of SAP, you know, details of that. - Yeah. - That was great. - Yeah, that's great. - Taught some hardware. - Yeah, totally. So that was awesome. - Good stuff. - Oh yeah, totally. It was great. But in terms of, you know, when you, I think there's sort of this perception of like, you know, you have snowflake, you have Databricks, and then BigQuery's, you know, the third one on the list, but all the headlines go towards snowflake and Databricks. And, you know, I mean, part of that could be because snowflake and Databricks are sort of, that's the main thing they do, whereas Alphabet is gigantic, and Google Cloud is, you know, a sprawling list of products, only to be, only to be dethroned by the AWS, you know, portfolio of products. But in terms of sort of the snowflake Databricks, BigQuery, give your perspective on that. I'd be interested. - And one other thing here, like, think about what we're talking about. We should be talking about Microsoft Azure SQL, AWS, Redshift, Athena, whatever, and BigQuery. That is, should be the conversation. - Man, I need to get out of the, I need to get out of the digital data sphere and stop reading the headlines about, you know, battle royale. - But I think it's a really significant conversation that there's like clearly like who should be, and then an Oracle, like we just skipped over Oracle. Like those are the four people that should be in this conversation. And only one of them is, which is a big deal. Like, like, you know, snowflake and Databricks are great too, but it's a big deal that BigQuery is in that conversation. And I'd be interested if like, if you have any thoughts on like why, how did that team win, how did that team beat out, you know, all these other products that should be just as viable, theoretically. - So first, I'm not going to dunk on Databricks or snowflake, those are both great products and I've lost to, I'm not going to say which one, but I've lost to one of them more than I would care to admit when I was there in BigQuery. - Sure. The challenge that I think comes with it is, especially when you're talking about a hyperscaler, there's a question of how much do I have to commit in terms of getting a return on what this is, right? Because if you're running most of your application in AWS or in Azure, you probably will just use whatever they have and kind of suck it up and deal with it. - Right, yeah. - And GCP, for most people, for better or worse, I would argue worse, but that's not what we're here to talk about, will not have GCP as their default and so we'll miss out on what goodness that this could provide. So I think that really is what holds people back, whereas you look at snowflake, you look at Databricks, a big core value prop is multi-cloud, do the whole thing, it doesn't matter where. It's like, yeah, that's not a thing that BigQuery could for a long time talk about. They just recently got it, got towards that in the last couple of years I was there with federated queries and stuff, but even then, now you have even less of a tie to this platform that I don't know if I have to go learn and figure all out and I have to give some empathy to that because here's a little bit of humble python, I'm gonna go ahead and talk about eating. I was in Google Cloud for like a bunch of years. I think I'm a pretty sharp guy, mostly. I thought I understood what cloud was. I hadn't really dabbled with AWS until about a month and a half ago, too terribly much, and I was very humbled at how different these two things were in so many respects and I could see perhaps a lot of the architectural decisions they've made of like, oh, I see how they got there. I don't understand why I have to open so many browser tabs and I don't understand why all of the instructions are out of order. I don't know if you guys, I don't know if you guys already know AWS, but trying to learn AWS in 2024 is insane because there is no from ground zero to a tutorial out there that is up to date. They're all half old with APIs and stuff that don't work. - It's a monster. Like it's a monster, absolute monster. - All I have to say is that at GCP, like that doesn't really exist. Someone is in charge of making sure it all works together and boy, is that a change? But I recognize that other people don't wanna take on what is, they expect to be that madness times too, if not more so. So I hear it. - Yeah, this was almost 10 years ago, but I had to buy like plural site classes to like get through some of it. So I did a bunch of like modernization efforts, you know, almost 10 years ago now. And even then the documentation was either fairly inaccessible or just like you said, I like, I don't know. So I just got a plural site subscription and like, you know, walk for some of the classes. - Yeah. - Chat GBTF definitely failed me in terms of trying to get the up to speed on AWS because it was some number of versions behind it. - Should have asked Alexa. - You should have asked Alexa. - I don't own an Alexa for one of the anthropic models, maybe it would be better. - Yeah. - Maybe they train those on the Amazon manual. - I should really sign up for cloud. I understand that one to be a bit more linguistically advanced, though not technically advanced. - Yeah. - Yeah, I've heard. - It is, I mean, one thing that I just returned to what you said about the system working together and then also considering what you said about, I don't remember the specific name, but Google's implementation of Beam. - We don't have a lot of people on naming. We're not, they're never good at it. - Yeah, I mean, that's like the hardest, that's a very difficult thing to get really good at, especially with that level of product, catalog bet. I mean, there is a lot to be said for, okay, this is in a combined platform and if I were just gonna go to market and I could buy anything I wanted and build this perfect thing, that's great. But the reality for a lot of people is like, ooh, procurement is a beast and like, these things work together. And so even if it's not ideal, it's just a pipeline, right? It's gonna run and so did you see that dynamic a lot where it's like the advantage of a connected ecosystem can outweigh the challenge maybe or like the rough edges of an individual product? - Yeah, I think a lot of GCP customers can testify to that for sure. And I think that has to do with the different development approaches that the different hyperscalers have. So AWS famously built on two pizza teams, right? You got a features and stuff need to be shipped on like a relatively small number of teams. What that means is that your interface boundaries grow a lot. And what we see in 2024, if again, you're coming to this new, is there's so many checkboxes, they're so out of order. They so have different expectations around all these because this team built that checkbox, this team built that checkbox, this one did this. And you can feel it, whereas in GCP, like it just, someone is in charge of the console and the flow of it. And it just makes so much more top downs type of coherent sense that, yeah, whether or not the dashboarding solution inside of GCP is like the greatest thing since sliced bread, it definitely works and it definitely plugs straight into BigQuery and takes advantage of a ton of optimizations that they have under the hood that is totally like keeps everything fresh in a way that is harder to do when you're not. Shout out to all of the dashboarding solutions that do great stuff, not trying to knock any of them, but you know, there's just more cohesion that you can take from that perspective. - Yeah, sure. Okay, one more question for me on Google Cloud. Do you think that Google's, and this is a, how do I wanna ask this? So Google's different business units, you know, at least I've never worked for Google, but just from my experience, you know, sort of building some technology on Google in a previous life, like even product, like individual products can have like parts of them that are like pretty disconnected, not to the level of the Amazon sort of two pizza like checkboxes are out of order, but one interesting thing, at least as a user of BigQuery, like I use the Google Cloud to scaffold out a bunch of personal projects, and it is very approachable, you know, just even using like their different APIs and other things is like very approachable. And so you can build out a project really quickly, right? And just everything works with BigQuery, and it is like super nice. Do you think that comes from Google's competency in consumer facing products, right? I mean, that's really where they came from was like deeply consumer facing. - Like Gmail. - Yeah, exactly. You know, search Gmail where there's a significant emphasis on, you know, sort of emphasizing like simplicity and flow, or is that disconnected? Because you could tell me either way, and I wouldn't necessarily be surprised, but I'm curious. - A obligatory disclaimer here that all opinions announced to hear in this podcast are solely the property of David Wen and not of any particular analysis of any entity. - And this does not constitute as investment advice. - It does not constitute investment, legal, or technical advice, please consider everything I say stupid. The, I don't think so. I think, 'cause here's a really interesting thing about Google Cloud. What was Google Cloud's first product? You guys remember? - Storage, but I actually don't know. - So that's S3, you're thinking of. S3 was the first product for AWS, which was released in 2004. - Right, I was but not, but Google buckets for Google. - No, Google's first product was App Engine, which is the entire development platform built in one. Now the reason for that is that is how Google developers work internally. And so the idea was down to the part where they actually run it on infrastructure inside of board, inside of the thing that runs Google. They, this is the development model that we use here. Everyone should use the development model here. That didn't catch on for a lot of reasons. Partly, at least because people would have to rewrite a lot of applications. They didn't want to rewrite. So, oh, okay, maybe if we want to go get this market faster and more directly, I think AWS had a much better approach to that where it's like, let us give you exactly what you are familiar with IT teams and we will slice it up for you and charge you by the slice and have a nice little thing right here. Whereas Google tried to bring a bit more of the Google way of doing things. So when we talk about projects, which I do think is a meaningful boundary that they drew in GCP early on, I think that was more of a happy accident from the way that App Engine was structured because I do think it's a much more coherent way to organize stuff than, I mean, does AWS have a project boundary now? I feel like you can do some things with ACLs and stuff like that, but mostly it's still just like, I hope you logged in with the right account because here it goes. - I don't know, Azure has resources that are kind of like projects. - Google has the most clear boundary. I mean, you can tag things in AWS and you can have different accounts and you can have a unified account with subaccounts, but beyond that, I don't know. - Yeah, it is really nice in Google though. Like the other day I had this 250 page PDF exported from a note-taking app on an iPad and I was like, I don't know why I wanted to experiment with OCR stuff, but Google has some really very cool products around that and I mean, spinning up a project, like because they're required because those are pretty heavy and so like they require you to add billing. Well, I mean, we could discuss why they require you to add billing. That one makes sense 'cause you're, you know, if you'd really hammer the system. But it was, I was like, this is unbelievable. You know, like I ran a test in a couple minutes, you know, it was like super cool. - Yeah, it's good stuff, highly recommended, especially if you like light blue. It's got a pretty tight theme on there. - Yeah, it does. - I will, I will go on record. I assume Sundar is listening to this. Sundar, I'm gonna go ahead and tell you something that I didn't do a chance to tell you in person, which is that the old logo for Google Cloud was better with the rivets, it should come back. I recognize it didn't have all four colors and that maybe is branding standards. And like as a thing, but it felt nice. Anyway, that's my high horse. I'll just step up and step up real quick. - And Sundar, the data stack show has a message for you. We would love for you to come on the show and talk about data at Google. - In the Google logo, you know? - In the Google logo. - Really, you set the agenda. It's great. - Yeah, yes, yes. Okay, that was great. That was great. Just as a reminder to the listeners who were driving and trying to look at maps and winter at the same time, we are here with David Nguyen from Edge Delta and we're talking about all things Google Cloud. What was next on the list though? - Well, we gotta talk some about observability, for sure. - Yes. - I wanna put this in here 'cause I'm curious about your take on BigQuery. So open source table format, specifically iceberg is making a lot of splashes. - Big splashes, which. - And the concept is great, right? Like you can have this open storage concept that can be an S3 or GCP, like whatever storage you want and then you're less locked into all these products. So then that pushes all the like battles up to the compute layer, right? So you got a snowflake engine, you got a Databricks engine. - It's good for the consumer. - It's good for the consumer, allegedly. So where does GCP stand with that? You think, this could be a thing today. Can you use GCP today to access data in iceberg? - You know, that's a great question, but I'm afraid iceberg came along a little bit after I left GCP. So I am not sure that I'm really equipped to answer it. - I'm asking Google. - Okay, perfect. - It's direct, I mean, it seems like directionally, right? That would be that, okay, Amazon, Oracle, Microsoft, like this is your chance. Like have a query engine that basically is just compute and access data in iceberg, like ready go. Do you think any of them will do it? - I mean, functionally, that's what Athena and BigQuery already do, right? They have separate compute stacks on top of some either proprietary or non-proprietary formats that they can spin up at will. - But embracing a new open source standard is really the question. Obviously they're capable technologically, but will they play in the iceberg? - Yeah, I mean, they took on Parquet. I don't see any reason they wouldn't take on iceberg, right? Like it's invariable. The much more interesting questions to me are, as we evolve our understanding and our practice of what we need to do as data analysts and data engineers, how does that change what we need to do, right? 'Cause I just came back from Monoturama in Portland last week. Shout out to the organizers there, it was a great conference, much appreciated. They're one of the dominant themes because in observability, we've got this concept of observability data, but it's not in tables, it's not in open formats, it's not in anything like that. Like there is this concept of open telemetry, which does standardize the line protocol a little bit and has an agent associated with it. But mostly there's a whole bunch of all kinds of stuff floating around here from log lines to time series data to trace information, which is sort of logs, but with parent IDs and back and forth and stuff. The old approach was what I call the Patrick Star model of why don't we take all of the data over here and put it over here? So that at least then I don't have to go to 80,000 machines if I wanna see if something went wrong. That's like a level one improvement, for sure. And was viable at gigs of data per day, but now we've got terabytes of data per day. Now we've got hundreds of terabytes of data per day. Now we've got some of the big organizations are generating a petabyte of observability data per day. And it's like, we've gotta take this one step back and think about, okay, what are we doing here? (laughing) 'Cause you just can't move that all across the wire fast enough to matter. And so edge delta is obviously helping to push this forward, but there are other people that are in the same vein of like, we need to push that distributing down as far to where the data can be as possible. So we can do aggregations and filtering and routing and stuff where all of that data is created. That's one method to think about it. But you know, we might even have to think about what kind of data it is we're making. And how do we use it? Because man, we've gotta tie the dots on these things. Can I talk about my worst meeting? You don't have to say yes, but maybe I'll go ahead and say it anyway. When I was at UPS, one of the things that inclined me to get out of data analysis was I had been given a charge and put together a dashboard and an analytical report on, I honestly don't even remember what. I remember working on it for, I think it was a week or something, you know, good chunk of time, particularly as a young guy who didn't know what I was doing. And then I walk into the meeting and I start talking and I can see the guy's face change right away. And within 30 seconds, he stops me and says, David, this looks great, but I wanted to let you know this isn't what we're looking for. I wanted to see this and this and this. And I was like, huh, that was a week's worth of effort for a whole host of things that I thought were interesting bubble up style, that I'm now being told to do a little bit more top down style in a different direction. And it's like, huh. - Something about this went wrong that I wasted. - Did he just ask for hard-coded values? He's like, can you just code these values? I want it to be up and to the right. - Man, I really wish I remembered the specifics, but I mostly remember his face. - Yeah, it was like, you know, if you go in with data to present something, and in the first 30 seconds, that big blinking red light on your internal dashboard is like, we've lost the audience here. - Yes, there's such a thing that I'm always interested in ways that we can tie this type of stuff closer together. And I feel like as analysts and engineers, we can get a little bit caught up in the properties of what this is, without thinking enough about how it ties back to the greater objectives of what we have that's actually going on here, right? And I think the next wave you'll see an observability, but honestly, a little bit from the analytical side as we start taking more and more control over our data via open formats or what have you is, this needs to line up to the thing that we all need to do here. So how do we tie those things together so that we don't burn any cycles that we can miss on? - Yeah, for those keeping score at home, by the way, talking about open table formats, straight from Gemini, yes, you can query Apache Iceberg data in Google Cloud using BigQuery. BigQuery supports Iceberg format through big lake meta store. - Big lake meta store. - They're up in their naming game. - Big lake. - Yeah. - That's kind of cool. - Yeah, I kind of like it. - I like that they made sure you knew it was big. - Yes. - And lake. They got the data lake. - Yeah, it gets a little-- - What's the largest one? - The number of customers that wanted to change the name of like data lake, they're like, I think we don't want a data lake. We want like a data ocean. We want like a data galaxy. And you're just like, yeah, man, absolutely. Keep going. (laughing) - So our conversation about observability reminds me of something that we've talked about with Rudderstack in AutoTrack. You remember-- - Oh, AutoTrack, yeah. - Right, so there's this problem I'll let Eric describe it, but it's what you're saying. We're like, hey, let's just like go in and collect everything, right? And you have this decoupled like technical team like, I don't know what's useful, I'll collect and store everything. And then like downstream, you know, and business team that's like, I don't care about any of this stuff and it's not like I might care about some of it, as I literally will never care about this piece of it. So every moment wasted engineering, collecting, tracking, storing, retrieving is complete waste. - Well, the interesting thing about that, so the context is, you know, so Rudderstack collects user behavioral data, you know, so telemetry from like your website or app, et cetera. And early on in the life of the company, they had experimented with an AutoTrack feature, which is basically you install the script on your site and it just tracks every change in the DOM on your website. And just sends that as a payload. - Sounds crazy. - It's so noisy. Now, something really interesting, I don't know if you listened to this show, but we had someone from the analytics company, heap on the data section. And heaps, one of their big differentiators was AutoTrack and they stuck with it and actually ended up figuring out how to make it work. But listen to this. This is, this has sounded me. It took their engineers, because they, like our model is very different. We send everything, you know, to the warehouse or whatever. We don't actually store any of the data. We have, you know, sort of standardized schemas or whatever. But heap as an analytics tool. So not only do you collection, but they actually provide like an analytics visualization, you know, layer or whatever. But I think the guy, I think the guy said it took their engineering team, like five or six years to build a system that had reasonable SaaS cogs on AutoTrack. - Wow. - And then they did an immense amount of work to reduce the noise. And now it allowed them to do some very interesting things because if you can actually solve those two problems, you know, then you do have an interesting data set to work with. - Right. - But like that was astounding, right? And it was actually, I mean, I seem to remember, I can't remember the exact details of the conversation, but the founders had to be extremely opinionated, both with like operators and investors to say, we are going to have really bad cogs until we like figure this problem out, you know? - Right. - And so not only is there noise, but like the infrastructure impacts and sort of like underutilization of what is required under the hood to even process that is significant. - Yeah. - For sure. - Same with observability. - Absolutely. Well, I mean, you're basically talking about a different form of observability, right? When you're zeroing in on user and behavior, that's not what we traditionally think of in observability because we're looking at the application and its action. - AI. - Sure. - Presumably, usually users initiate those actions. - Yeah, a lot of timestamp messages. Okay, so let's talk about, we can't not talk about AI when we're talking about petabytes of disparate data. - I was just gonna say, can you imagine how disappointed everyone's gonna be that we've gotten this far without putting AI together? - It's like a game every week, honestly. We're like, how long can we push this conversation without talking about AI? We do pretty good. - Although we did like, you know, we did disregard Microsoft and Oracle, you know, in favor of the, you know, the darlings of the Valley. So, we at least checked that box, you know? And we talked about iceberg, you know, so. - Okay, but legitimate question. I mean, when you think about petabytes of data, petabytes of different types of data in a context of observability, like of course, you go to, I mean, of course, the default is it can AI solve that problem, right? But it's a machine learning application, right? Where you're looking for, you're looking for anomalies in like a giant, you know, stream of data. But how do you think about that at edge delta? - Yeah, so we are using a very hybrid approach of traditional approaches with sort of your standard type of alerts and search and various other things that go on that people would expect, as well as some machine learning driven sort of dynamic behavior, but it just makes alerting a little bit easier because we're rebase lining everything for you. So in that sense, we're letting the model, it is very hard for a constantly changing application to have fixed alerts that makes sense over a long enough period of time 'cause you get drunk. There just is no, there's not a great way to do it. And currently we're solving it by the fact that SREs hopefully remember what alerts they have and if they have gotten quiet for too long, they'll go in and check them or if they've gotten too noisy. They will fix them and not just send them straight to spam. - Yeah, we've recently experimented as well with putting some LLM AI on top of our anomaly detection. So that is some very high signal to noise type stuff. I call it almost like the two AM checklist of, if you get paid at three in the morning because something has gone wrong and you are like, oh my God, how did I ever keep those monitor this bright? I just wanna do a thing and give back to sleep. Like the AI, we've added a little LLMs in there too, just give you a, hey, maybe you wanna look at this, maybe you wanna look at this. - Yeah, yeah. - It doesn't auto do anything on purpose because-- - Sure, I mean, that's-- - You say sure, there are people that think, that's the way forward, but-- - Not it, not at two AM, it is not. - If it gets the alert to go away, they are absolutely SREs, they would push the button and do that. If there's no rollback button but there's AI fix button, they would absolutely do it. - Exactly, I know, that's the reason it's not solution. - Yeah, so it just makes suggestions for you along the lines of, hey, you might wanna look at this, you might wanna look at this based on the anomaly and the information that we could all correlate across the different information. So that's the direction that we are taking with it. - Yeah. - I personally, I am on record as thinking that AI is not going to quote-unquote fix observability, which I liken to, hey, we've got a petabyte of data, let's dump it in there and it's like, yeah, are you gonna train that model? Do you, are you ready to spend all that information? And that's just the cog side, even more importantly, if developers are good at one thing, first of all, any developer listening to the podcast right now is amazing and never makes these errors, but other developers. - Right, the other ones. - If you've met any other developers, they're really good at creating new ways for software to screw up. And the idea that you will have a data set that has all of the errors that you could want to track in the future is comical. - Yeah. - Yeah, pretty cool. - And I've told the story several times of my favorite database error when I was working at the ETL company was a database error that I got that said, you haven't paid us. And I was like, what? And it turns out we were using a Salesforce sinking tool to go from local database writing. You basically circumvented this because you could write to your local database and they would handle the sinking into Salesforce. - Yeah, okay. But we forgot to pay them. So I was like, I've never seen that error again. Is it useful to have any LLMs trade on that? No, and that analog holds two infinite ways that we can combine bits together. So I'm very skeptical of the idea that AI is coming to fix observability in particular. And similarly, I'm a little bit skeptical but sort of in the broad, too. That's a bit more of an open question. - So the idea just in this one example would be like, okay, we've got an AI in place. It is going to be able to, and never having seen this before, read this error message that says something very generic, like you haven't paid us, to know that there's some vendor out there that you're using to sync from A to B and to prompt somebody like, hey, you need to go pay them. Like that, yeah, that makes sense that that wouldn't work. - In the same context requirement. - Yeah, exactly. - Yeah. - Yeah. Well, I think of it, I really think one of the big challenges we have is if we're not directly at the frontier of research, we're getting a lot of second degree assessments of what AI can do and what we can't do. And so I think what we really need, even for people in our position, and I can't speak to how familiar you guys are with it as well, not making any disparagement, it's just we need better mental models of what it's like. And so the one that I give to everybody is this. And LLM, our current understanding of LLM's as of time of recording is it's a bit like a guy in a bar who has overheard 10,000 hours of conversations about motorcycles. So he's never seen one, he's never touched one, he's never ridden one. But if you ask him any question about a motorcycle, he probably knows the answer, but occasionally he might compliment how your torque smells. And so he doesn't know the difference, it's not his fault. And the way they work is not, at least based on our currently understandings, again, at time of recording, is they don't reason, but they're associative. The reason that Chain of Thought works is that it snaps the words into a reason-like looking object. And so when a lot of AI products and startups pitch themselves on the idea that AI will be able to think or reason or do the decision-making part, I'm pretty skeptical. But if it can do some of the things that computers are very good at, like computers never get tired, so I think they're very good at brainstorming, pulling together different associative ideas. You know, a lot of baseline stuff that can help clear the blank page problem, I think AI'll be great for like a hundred different paper cuts in normal everyday life, just like data was. You know, 20 years ago or something, like data's gonna change everything. It hasn't knocked out the economy, it's just made all of the little things that we do a little bit different. And I think we'll see that too. - And of the 10,000 hours of the guy listening at the bar, he was drunk for several hundred of them, but we don't know which ones, right? - Yeah, right. - And he's not so sure about the, he's got some hazy gaps, right? - Or he wasn't paying attention to who was drunk and who wasn't. - Or yeah, he cheated every conversation about motorcycles, including the one where he's like, guys, I just had the most amazing day. And he's like, okay, that guy sounded like he had fun. I'm gonna remember that. - Yes, training on Reddit data is a, is the analogy there. - It's too good. All right, well, we're at the buzzer. I think one of my big takeaways is that being an SRE is like having children in that, you know that really bad gut feeling that you get when you're like, our house is way too quiet. - It's just something wrong. - This is an excellent analogy. Unless there is a type of, maybe a type of pet owner that has a very defined cage and fence where they're okay with silence because they know exactly where everything is. - Yeah. - Hands all those SREs out there. - That's not true. - All right, well, Dave, thanks so much for joining us on the show. It was an absolute blast. And we'd love to have you back sometimes in. - Absolutely, we'll do it. - Take care guys.