Archive.fm

Cloud Commute

How to Build a Serverless Postgres? ft Gwen Shapira

Duration:
28m
Broadcast on:
16 Aug 2024
Audio Format:
mp3

When we started Niall, my co-founder and I, we really wanted something that will signify we're building a platform. So, for us, having this major river, major artery that things were built on top of was a very good analogy. The most valued guy from my perspective was the one that built the Argo CD, the Jenkins pipeline, the Coronatus cluster, all of that. Because in the end, it was literally you commit to your Git repository, you make a new tag, the build pipeline picks it up, the deployment pipeline picks it up, and you literally just magically have the new version deployed. It was incredible. So, we were like, how do we isolate them? How do we give them privacy and security? How do we give them each one the performers they need? How do you scale up as the number of customers grow, which is different than scaling up as a single customer growth? You're listening to SimplyBlox Cloud Commute Podcast, your weekly 20-minute podcast about cloud technologies, Kubernetes, security, sustainability, and more. Hello everyone, welcome back to this week's episode of SimplyBlox Cloud Commute Podcast. I'm not saying I have another incredible guest because today I have an actually super, super incredible guest, very special. Don't laugh. You know, it's right. Gwen, it is a pleasure to have you. I don't want to say that all the other guests are boring, but to me, you are actually special. So, thank you very much for being here. Maybe just tell us a little bit about yourself, who are you, what do you do, and we'll take you from there. Thank you. So, first of all, thank you for inviting me, and I'm super, super, super duper excited to be on your show. Yeah, I am currently co-founder of NINE, where we are building several spots for SaaS, and before that I spent almost seven years at Confluent Agent. So, Confluent, for those who haven't heard, it's the Kafka Company, essentially. There are a bunch of Kafka companies now, and it's probably very not partially correct to say that, but it was a company founded by the people who started the Apache Kafka project and had huge investments in the Apache Kafka project over the years. I was there from when I was like employee, either 10, 11 or 12, we joined three of us at the same day, so I will never know. And by the time I left, it was thousands of employees, so that was quite a journey. I did a lot of different roles. I joined as an engineer, I moved to product, I moved to marketing, and then I moved back to engineering, and then I did some engineering management. So, I go to experience the ride, which kind of helped me prepare to start my own company in some ways, especially distinct in product and marketing. And then before that, all of my background is basically in data, I started at HP, and then I worked on Oracle for a very long time, I did some MySQLs, a dupe, and then it was a Confluent and Kafka. So, I have a long career, just the ways I like saying it is spent the last 20 years moving data from place to place, and I'm not done yet. You see, exactly that's the reason why I say you're a very special guest to me, because I know you mostly from Twitter, these days X. And I've never seen you talk about anything but data, and that is amazing. Any kind of database, any kind of, as I said, data movement, data placement, whatever, and I specifically love all your engagement with the different communities, asking questions, actually meaningful questions. I think that is very different from a lot of other people. Yeah, as I said, thank you for being here. It's awesome. I'm the one who's excited. You talked about Niall, I always want to say Niall, the German pronunciation of it. We talked about that before in the pre-run to that session. Maybe tell us a little bit about Niall, your serverless Postgres installation, and why it is called Niall. Yeah, the name question. First of all, I'm from the Middle East, so I kind of grew up with the Niall and stories about the Niall and so on. And for me, it's kind of the bedrock of what used to be a giant empire, and it's still the bedrock of so much commerce and so much life that has been going on around it. And when we started Niall, my co-founder, and I, we really wanted something that will signify we're building a platform that businesses and things in general, a life will be built on top of that. And so for us, having this major river, major artery that things were built on top of was a very good analogy. I would also point out, completely coincidentally, Niall is slightly longer than Amazon. That is important. I get that. It's slightly important. Yes. So, yeah, so we like the name. We like that it was four letters. It's short. It's memorable. So it has good things for a company name. And we actually iterated over several ideas. So we started the company with the idea that, hey, at Confluent, building Kafka took N people. Building the entire cloud services, the self-serve, the SaaS on top of Kafka, it was like three ex people. It was much harder. And this was surprised to me. I was always, as I said, like I was always into building data infrastructure. I saw this is hard. And this is really hard. I didn't realize that there is something that is actually much harder as it comes on top. So we really wanted to build something that will reduce the overhead of building these services around the service. And our first iteration was a control plane. We were not very happy with that. We didn't find the other people who were happy with it. And then we were thinking that actually the kind of problems we wanted to solve about managing multiple tenants on the same platform, which is the core problem of SaaS. How do you provide one service to a lot of customers with very different needs and very different requirements, very different workloads. So we were like, how do we isolate them? How do we give them privacy and security? How do they give them each one the performers they need? How do you scale up as the number of customers grow, which is different than scaling up as a single customer growth. So we took all those ideas and we had actually the right layer to solve those is in the database. The idea is that what you actually need to isolate is data access, all the authorization, authentication, those are all data level concerns. They should be enforced close to the data and performance in it has a lot of factors, but obviously the database is major in allowing companies to scale and you see it in blogs again and again, things like flexibility. Can you customize your product for one customer versus the other. A lot of times it also ends up being a database concern. So we're like, okay, we have to take a good solid database that is open source and adopted to the concerns of people building software service, because those are not adequately addressed today, the data layer. And you took the only meaningful option, which is Postgres. We like Postgres a lot. I can do a serverless database and kind of host it and manage it. You cannot pick a database that you don't really, really like you and you have to believe that it is the best database ever otherwise you, you know, it's hard to run something as a service for other customers. I don't believe that you make really choosing something solid. It's probably impossible. I would totally agree with that. And I'm a big Postgres fan myself for for all the same reasons. Just quickly. The Postgres ecosystem is massive. There's like so many extensions. I guess you can run extensions on Nile. You make your own extensions. You, we may get there at some point, but we're not quite there and partially it's because of the. Yeah, we are currently still limiting what people can do. You also cannot even run your own functions at this point. We will get there. We're still, we are still only a year to enter the startup journey. But at this point, what we are doing is taking popular extensions and baking them into Nile. So our customers pick up the phone or realistically pick up their discord and say, Hey, we need this extension post us. Pg vector all those things. Crip Pg crypto has been super popular. And we're like, OK, we're on it. And we're also writing our own extensions. We have, I mean, obviously, Nile itself is an extension. And we also have team members who have their open source extensions that are quite popular. Right. Right. I think that I think that is a totally fair approach. And I guess a year in you probably have most of the extensions that people want anyway. So we're probably good on that side. We try to be responsive to customers. This isn't right when you're a startup. Right. So you mentioned that you're building your own extension. You say post serverless postgres postgres by itself isn't really serverless. So what would you say with like the biggest like challenge hurdle you had to overcome to make postgres serverless. You know, it really scares me to answer this question because deep down, I feel like we have not met the biggest challenge yet. I feel like it's the other shoe is about to drop. I feel like it's still ahead of us. So I'm a bit, I would want to qualify the biggest challenge to date. We don't know what we don't know. And this is maybe the scariest part. I think there are two things that were challenging and they're actually intertwined, maybe three things. So one of them is the transactional guarantees. We want to, big reasons that people pick postgres and not one good to be is that it's a relational database with a strong AC guarantees. Postgres is actually one of the best databases in terms of transactional guarantees. We want to say serializable. It's snapshot serializable, which is better than snapshot isolation. It has better guarantees. So they did some, so much good work. And, and to open up a lot of questions, can you have a transaction that involves multiple tenants, or do you limit transactions only to one tenant. How does it work as you try to scale. So dealing with those concerns. And we choose to take limitations. Your transactions have to be for a single tenant at this point. So you, we don't let you transactionally update an entire table with all the tenants in it. Some people don't love it, but we feel that this is the bread and butter of us, isolating to changes to individuals. What do you do writing over multiple tenants. That sounds scary. Yes. And when we allow it, we found that customers often regret it. Like the number of times that you deleted or updated an entire table because you unintentionally left out a workload. Another close is, yeah, distressing. And so, yeah, we do, we are opinionated to an extent in preventing people from shooting themselves in the foot. So dealing with the transaction limitations has been an interesting experience. So there are things that do have to get distributed to all the tenants. So, for example, no matter how much we isolate tenants to their own databases. If you're adding a new column to a table, you need every single tenant to have this new column. And so dealing with those distributed details has been really interesting. Storage is interesting. And I think the interesting bit is how intertwined everything is in progress, because we, if we want to have the tenant isolation and the storage layer, which we really want because this is the magic that allows us to move tenants to new machines as you get more and more customers. We can kind of also sharpen it for you and spread it out. And this means that every data block has to know which tenant it belongs to every record in the wall has to know which tenant it belongs to. So we, this, it's kind of a big rabbit hole that you go down transactions in the commit logs, they have to know which tenant they belong to. And so this is, it's been an interesting journey. And as I said, I don't think we're at the end of the journey at all at this point. Right. That's, that's, that's fair. I think if you're, if you're making such a big change to a system, which is not designed well, which, which wasn't bill, not necessarily not being designed for it, but it wasn't like a thought when they created the actual system. So, so that is, that is really interesting. I think it's, it's always fair to say, we don't know where the road is, is ahead, because nobody, well, almost nobody did this before. There are some others that say several of us postgres, but I think it, it really depends on how you actually define several of us or not. So this serverless is another rabbit hole. That is, that is true. So as, as a developer, how would I, how would I get started? I mean, it is postgres. So I guess any postgres client works for me. And absolutely, any plot postgres client, any ORMs that you enjoy, we tested with the popular ones. So I can tell you for, and we have examples, like, as you probably know, we share a depression for developer experience and really getting, making sure the developers of all kinds experienced beginners, this language, that language, meeting developers, where they are, is so important. So we have tutorials for any, or any popular or, and we feel like Prismas and threes and SQL alchemy, and the hibernate. Like, we have Django, we have, we basically try to cover everything that people use and just give a small example. And you use it completely normally, there's only two tricky bits that you need to get. Now it shows up with some tables already built in, like the tenants table, you need to pull them into your room and generate an object from the database, which is something that some developers haven't done before, but it's actually very easy in every forum. And the other one is that you need in a transaction to specify which tenant the transaction is for because that's the entire point. And, and we, this is the main point of most of our examples here is the snippet of how you specify the tenant in Prisma in hibernate in drizzle and so on. So we kind of tight end to end, because we are very fast focused, we believe that the tenant will probably be a header in an HTTP request, there will probably be a joke or a session with the user. And so we can tie the HTTP header down to the transaction, we can tie the user down to the transaction, we can validate that the user actually has access to the tenant that is mentioned in the transaction. So this is the whole point of pushing all this information from the browser from the layer all the way down to the database so we can do this kind of things. Right, right. Okay, cool. And I can, I can just go to what is it the Nile. I knew it was not IO, the Nile that I can, I always probably free, but we thought that we, especially since we didn't start with that. I think we thought that I know is we're not really dealing with, we are all about. Okay, fair enough. So the Nile.dev and I sign up for an account and I get all the credentials I need. Exactly. All right. So, so we're cloud podcast so who would we be not to ask the question of all questions like how does that look like back and wise. I mean, we're talking a lot of credit is here. And just, yeah, we actually migrated to Kubernetes fairly recently. So we started out with just CCS, so a bunch of easy two machines, some plume scripts to deploy stuff on them. And yeah, it worked for us for a very long time. It got really messy as we added more services. It got pretty messy to do releases. And that's when we moved to Kubernetes. We now have Helm charts, we have Flux CD. So everyone with a service can basically upgrade the version. It actually happens automatically. So you kind of merge and then a triggers a bunch of tests deployed to dev. It opens a pull request to upgrade production. And if you merge the pull request, it kind of goes on. So we have a fantastic, yeah, in for engineer who just set it all up. Good in for engineers are just so, I think, understated, like the tech for granted, the fact that they actually do quite amazing things. And they're like, no, this is obvious. This is how you do work. But then for me, it's like, Oh, my God, it's all magical. I think I completely agree with you. It was the same with my startup. One of the most valued and I'm not saying that all the other guys were like really bad, but the most valued guy from my perspective was the one that build the Argo CD, the, the Jenkins pipeline. The coronatus cluster, all of that, because in the end, it was literally you commit to your git repository. You make a new tack. The build pipeline picks it up the deployment pipeline picks it up. And you're literally just magically have the new version deploy. It was, it was incredible. It saves other engineers so much time, like the way it's just this giant force multiplier, and it prevents all kinds of random incidents, possibly generates some new ones, but I think on the balance, it's just been pretty amazing. I think the problem is that a lot of that is fairly invisible. It's kind of the same problem we had in the past where front end engineers were like the ones everyone loved or graphic designers because they had something to show. And as a back end engineer, you were the one like, yeah, I'm working on that for like weeks. Show me. Yeah, it's really hard, especially in larger companies where you have formal like performance reviews and performance collaborations across the company. There is this thing where you have to show the impact of an engineer, right. And one hand you have this front end engineer who moved the bottom two pixels to the left and suddenly conversion is up by 50% and huge impact on the entire company. But what he did is move a pixel by a few pixels. And on the other hand, you have someone who completely rewrote a core part of the storage engine to be 30 times faster. And 100 times more reliable. And okay, but what was the impact? Well, customers no longer lose data and they're slightly happier because it's faster. Yes, yes. That is that is that is so true. It's so sad. I've been there my in my career at least once, probably. Yeah, I do think that infrared engineers should not work for large companies. I mean, it's tongue in cheek. Obviously, a lot of them are very happy at Google and AWS. But I do feel like if you want to be noticed for your impact, a small company is so noticeable because everyone is an engineer. Everyone sees what you're doing. But I think that is in general. I mean, with simple block we're super small startup still around like 16, 17 people. And I chose to go back to those like small company environment exactly for that reason because you have influence. You really matter still. You're not just a number. Yeah. And like, how did we even move to communities? Well, we hired the info guy and he talked to developers and they're like, Oh, yeah, this sounds like a good idea. Let's try it. And then he did it and I was like, Oh, yeah, this is so much better. Okay, we're going. Yeah. That is, you see, that's, that's the impact I'm talking about, right? He came in, he made a suggestion, and you say it was, it was a good idea. It's great. It's much better than it was before. One thing, because simple block is a storage company. You mentioned that storage is a problem for you. In what sense? Yeah, I mean, so I wouldn't say that storage is exactly a problem, but it is a part of Postgres that we had to figure out how to modify, like find the spare bits in the block where we can put tenant identifiers kind of thing. But I would say that in general in the industry, if you look at incidents across the board, and there has been plenty of research, storage outages are kind of a leading cause of outages. And obviously storage performance has gigantic impact on the performance of any system, like you can only get worse than your storage. Your performance cannot really get a lot better than your storage. So this is a kind of a major thing, and I think the idea of teeming storage, which is something that I think confluent kind of pioneered, and I wasn't on the teams that pioneered it, but I was fairly close to it. The idea is that you have some data that has to be on expensive disks and very fast, and some data that you may have to be high throughput, but I'm very liable, but speed is like low latency is not exactly the concern and maybe actually having a lot of copies is more important. I think this is, I think one of the most important ideas when it comes to serverless and when it comes to building cloud native systems. Okay, I think that that is a perfect answer. I could not add anything to that. I mean, who would I be anyways. All right, we already crossed for 20 minute mark by a few minutes. So, we like the last question for every guest like what do you think is like the next big thing could be database could be a lot of people talk about AI could be anything anything you like. I'm kind of struggling to think as much as I want to talk up the future of databases I'm struggling to think of anything better than bigger than AI going on. One of the big questions is whether the idea of RAG of use vectorizing data, storing it in a database and using the vector similarity to enhance the performance of AI, whether this is here to stay. Every indication tells me that the concept is here to stay and the techniques are just getting better and better, both in terms of embeddings and also techniques outside of embeddings. I also see PG vector keeps getting better and better and more contribute a lot of contributions, performance improvements are going in new indexes, improvements to indexes. So I feel like watching, and I don't know that the vector data, I know it's controversial, there's a lot of people probably sitting with a vector database. I don't know that there is tons of appetite, especially in small startups where a lot of the AI is happening in small server, whether there is a lot of appetite for yet another database, versus. I think so much just let me do everything in Postgres and if Postgres is few percent point maybe slower than the absolute leading edge vector database, I think a lot of people will absolutely take the bargain because just let me do everything in my postgres. So I do think that, yeah, RAG is here to stay a vector similarity is getting better and better and faster and faster and I do think that PG vector is going to own a huge slice of it. All right, cool. Yeah, I think that is a very nice last sentence. RAG is here to stay. Thank you very much. We are at 26 minutes. I think we have to cut it here. Yeah, thank you very much for being here, awesome, awesome chat, just like the previous like the pre recording or not pre recording the pre chat we had. It is such such an honor to have you, actually, as I said, I'm the one who were excited here so thank you very much for being here. Thank you so much. It's been a pleasure. All right, and for the audience, you know how roles same time, next week, same place. I hope you're coming back and you listening again. Thank you very much for being here as well. Thank you very much. (gentle music) [MUSIC PLAYING]