Cloud Commute

Getting Started with Graph Databases with Jennifer Reif from Neo4j

Duration:: 26m
Broadcast on:: 12 Jul 2024
Audio Format:: mp3

And I always like to give the example that it took me learning cipher in order to understand what the SQL having and group by clause was trying to do. You're listening to Simpleblocks Cloud Commute podcast, your weekly 20 minute podcast about cloud technologies, Kubernetes, security, sustainability and more. Hello, welcome back everyone. Welcome back to the next episode of Simpleblocks Cloud Commute podcast. This week, I have another incredible guest. I know I say that every single time and it's, it's just true. They're all incredible. And you know that, right? Um, so, um, hello, Jennifer. Um, maybe, maybe, maybe just give us a quick introduction. Very quick. Sure. Um, my name is Jennifer Reich. I'm a developer advocate at Neo4j focusing on Java technologies and it's ecosystem. Uh, so I cover, cover the gamut on. Yes, anything, almost anything Java. Um, so I've, I've worked at Neo4j for the last. Well, since 2018, let me put it that way. It's been a little bit. Um, and I do some, uh, show up at some conferences as well as, uh, write blog posts, uh, do videos, a little bit of Neo4j's podcast, uh, graphstuff.fm and, uh, code demo projects and presentations in the whole, the whole nine yards. So I am happy to be here to talk to, uh, Christoph and chat a little bit about technology and Neo4j and so on and so. Awesome. I think you're actually the first person who ever said Christoph on, on the stream. So now people know how I'm really, I'm really cold. Chris is fine. Uh, it's, it's so much easier for the rest of the world. But you said you're working for Neo4j. Obviously I know what Neo4j is, but maybe just give the others a quick introduction. What is cool about it? What is it? And sure. And so Neo4j is a graph database. Um, and I guess to start off, um, it, just like any other database, it stores data. A lot of people will say, Oh, is it a layer on top of, uh, another type of database? No, it actually is a storage system. You store the data, writes to desk and the whole, the whole gamut there. Um, but it stores data differently than, uh, rows, tables, um, documents, so on. It stores data as entities and then relationships between them. So you actually write the relationships to the database. That makes it really easy to read those relationships back. So anything where you have a lot of complex relationships or a lot of relationships and a lot of hops through different types of data, a graph database is going to be optimized and more performant for those types of queries. Right. Um, so lots of things like networks or social network structures, supply chains, where you have a lot of depth and hopping around, um, even, uh, just, um, fraud detection. And, uh, there's a variety of different use cases, software dependencies, um, lots of other things. So I've seen it used for kind of hit or miss, just kind of random things. And it's like, Oh, I would have never thought to use a graph for that, but it worked really, really well, uh, for, for any type of case where you have a lot of relationships and a lot of, um, connections in your data. So that's interesting. I think the weirdest thing that I've built and, and, and at the same time, the, the most efficient thing was actually a permission system, uh, within heritans and roles and permissions and inheritance between the different roles, because you're basically can make a single, like cipher request and say, give me every permission that is somehow, uh, in the, in the, um, uh, in the hierarchy or in, in the inheritance graph, um, and remove everything that might be overridden as, as, um, I'm a wait, wait, what is the term? Uh, does it out, um, uh, denied, denied. That's it. Yeah, blocked. I like that. So that was, that was really nice. And it was so much easier than, than doing like a graph, uh, or like a table, tree kind of recursive SQL look up on, on a, on a relational database. Yeah, um, yeah, I think I still have a code somewhere that would be really cool. You should publish that somewhere or like, you know, I can, I can try to find it. And, and, and, um, well, let's, let's see if I maybe I hand it to you. Yeah. It's some, it's some genealogy or like family tree. It's just a couple of lines. Um, it was like, I think, uh, free, uh, types and, and, and, and for relationships or something and you're, you're done. It was, it was brilliant. Yeah. Anyway, um, so, um, you said it's a graph database and you gave a couple of, of, um, ideas, uh, what a draft database could be used for. And well, I, I hinted on, um, why graph databases might be easier. Right. So, uh, especially when you do like, like topology or, or, uh, relate any, any kind of relation lookups, um, you said social, uh, networks, um, parent, um, uh, family trees, anything like that, where you have relations, especially like when you look at your history or European history, like between the different Kings families. And there's a lot of connections and relations between almost all families. Yeah. So if you, if you're trying to understand or to, to look into those kinds of things, uh, graphs are, are super, uh, helpful and, and much easier. Um, but what would, what would you say is like, um, the biggest difference from, from a typical database, for example, like a relational database, except you said near for J or graph database. So it's slightly different. Yeah. I'm, I'm slightly biased. So I have a long list of things. I love a graph database, uh, for over other things. But if I had to, to narrow it down to just one, the thing that I find the most impactful is that you don't need to have expert knowledge about the data model. In order to pull valuable data from a graph database. Um, so you had mentioned, you know, you have a few different types of relationships. You don't have to know what those relationships are going into the graph database. You say, Hey, look, I know I have these entities, find all the ways they're connected. Um, and remove the connections that are, you know, the denials or the denied or blocked or whatever credentials, um, or access paths. And you can filter those types of relationships out and with a, a relational database, sure, that's, that's probably possible, but the amount of work and the amount of knowledge you have to have up front, first of the data model. And second of SQL in order to handle those very complex filterings and like sub queries and, and so on is, is a lot higher. That learning curve is a lot higher. Um, so that's the thing that I love most about graph databases is the, the data model itself is, um, you don't, it's not required to know it upfront. Um, well, and then it's naturally very visual. So it's just easier to navigate and easier to just explore without having this massive learning curve up front to know the data. I love that. I love that. Um, specifically as far as I remember near for Jay was involved into a lot of like analytical use cases, uh, towards things like the Panama papers, right? As far as I remember Panama papers, like the whole network was basically put into near for Jay and then the journalists started analyzing this massive graph and, and all those companies work together. And that is exactly what you said, right? You don't have to understand or you have to know yet how those things are connected. Or is, is it people? Is it companies that somehow work together that make the relation? Um, you figure that out while you're, while you're looking at the data and while you're looking at the graph and trying to, to understand what that means. Yeah. Yeah. My favorite thing is to just take a data set that looks interesting to me. Dump it into Neo4j and then just start querying and see what interesting things I find from it. And then that's what I end up focusing on and, and playing around with, where I feel like a relational database, it's almost the opposite. Um, you have to really kind of figure out and look at the data and the spread sheets or whatever, you know, data format you have and figure out, okay, what does the structure look like? How can I make the connections from one hop to, you know, the next table and so on. Um, a graph is a little bit of the reverse there. Yeah. Yeah. Well, I'm not sure it's, it's bad. A general graph database thing where it's about very specific to Neo4j because you don't necessarily need a schema. Yeah. I know there are some other graph databases that kind of have that, that schema, optional schema, less schema free. However, you want to turn it, um, and, uh, so Neo4j is not the only one in that category, um, but I, I feel like, um, just the, the length of time that Neo4j has been around, that, you know, we kind of have like a, a leg up on, on a lot of the, the other graph databases. So those that, that do provide that capability, um, it's, it's just a really nice feature. Right. Yeah. I, I, I'm asking because I think for, for relational databases, um, I mean, one of the critics or points that people always talked about and, and the whole like no SQL thing where it came from was like, you don't want the schema. You want this kind of schema list. You, you have an optional schema and if the schema can evolve over time, but with SQL database, or at least relational database, not necessarily SQL, but relational database, you have to come up with a relational model upfront and define it. Um, and I think that is where a lot of like the problems come when you have an unknown data set and a very complex data set. If it evolves over time, it's probably fine. But when you get something, um, it's probably much more complicated. Yeah. So, um, as a developer, I mean, I'm coming from a relational world. Um, so, um, a post-verse developer, but I understand I may need a graph database like Neo4j. So how would I get started with that? Well, one of the best ways we have currently is our database is a service, um, called Aura Neo4j Aura. Um, and we have free instances. So we have, you know, different tiers, of course, uh, we have a free tier and then kind of your paid tiers above that, depending on your, on your needs there. But the free tier is a really great place to start. Um, there's lots of tools surrounding that free tier. So they have like a data importer tool where you can dump, you can load up like PDFs or, or CSVs or, um, some other different types of data. And it will kind of help you get that data into a graph. So you don't have to have that knowledge up front. Um, and then you can kind of query, um, or play around with our, um, visualization tool called Bloom, um, and it kind of is a natural language query interface. So you don't have to know a lot of cipher up front. Um, even the, the cipher portion of it, uh, there's guides that kind of walk you through, um, so it's just a, we try our best to have a very, uh, low barrier to entry pathway there for, for people to learn. I think the, you mentioned cipher, the thing that makes cipher from, from my perspective, so much better than the other graph languages is that it actually looks like ASCII art. It looks, it looks beautiful. You look at the query and at some, if you, if you go a little bit deeper and use some of more complex constructs, it's a little bit more, more complicated to understand if you don't know how it works, but like a standard graph query over multiple notes and relationships. It, you look at that and it's an arrow telling you, Oh, here's a note. Here's relationship. And that's what I expect. And that is how many you can have between those. I just love it. Whoever came up with cipher, thank you. Thank you for the love of God. Yeah, yeah, it's, it's a super approachable query language I feel like I had, I had learned several years of, of SQL before I even knew about cipher. Um, and when I came over, uh, to, to the light side, if you will, at Neo4j, um, and started exploring cipher, there were several things that it's like, why in the world, doesn't everybody, you know, using something like this? Um, because it is, it's very easy to read, very easy to construct, at least kind of the, the general starting structures, right? Um, there, there's way more complex things you can do with it. There's still lots of things I look at it and go, okay, how do, how do I do this pattern, you know, construction and manipulation? Um, cause patterns are very complex. Um, but yeah, just at the outset, it's a much more approachable language I feel like and, and has some really cool, fun things, uh, to do with it. And I always like to give the example that it took me learning cipher in order to understand what the SQL having and group by clause was trying to do. Um, it was just way more apparent in cipher than in SQL. I, I agree. And, and I think, and that is where a graph database comes in in general, as I said earlier, in, in SQL, when you have those like multi-hop relationships, you end up doing something like this weird recursive SQL A works, but it's, it's never going to be nice. It's, it's, it's a recursive, um, a common table expression, uh, with the union and a join and I have to look it up every single time I have used it so many times, I always get like 95% to where I want to be. And then it just doesn't work the way I expected and I have to look it up. And I'm, I'm, I'm, I'm probably made some, some mistake on the join type or on the join clause and with, with Neo4j or in general with graph database and specifically cipher, it is so much easier to model that stuff. Even when you use a merge or something, it's still way easier. Yeah. And for those of you who are not familiar with cipher or thinking that this is a Neo4j thing, um, first of all, we have open cipher, um, which is, uh, completely open source, we open sourced it, I believe back in 2015, but just this year, um, Neo4j and several other graph database vendors all got together and came up with the ISO GQL standard, um, GQL standard that was released, I think like a month, month and a half ago now. Um, and so there is an official graph query language standard now, um, that cipher has poured a lot into that as well. Um, there's a lot of things that have, have come over from cipher as well as some other graph query languages too. So it, it will be an official like unified standard, of course, whenever, whenever we got it can kind of get, get to that. And ISO standard. Yep. Wow. I did not expect that to see in my lifetime. That is incredible. It's been several years in the making. Yeah. And Neo4j and all the other graph database vendors have been hard at work getting that all together, but yeah, it all got approved and everything just, just recently. So, so how does it work from, from a programming language perspective? Um, I know that, uh, Neo4j has a lot of drivers. Obviously it's not a sequels interface. So you need something different than, for example, in Java JDBC or in gold, the scan interface. Um, but I think there's drivers for almost every language of ever considered. Yeah. Um, we provide official drivers for like the bulk of your core languages. Um, and then there's community drivers that are very well supported, very well maintained by partners or communities or so on, uh, for several other languages. Um, and then we also do have like a JDBC driver, um, and, and other things too, as well as integrations to major frameworks. So like our spring dead and Neo4j integration has been around forever. Um, and, uh, several others as well. And of course, you know, we have like the, the big gen AI ones now, uh, you're laying chains, your llama index and, and so on too. So, um, basically anything you want to integrate, uh, with, or around Neo4j has some kind of connector integration or driver or something to, to, to do with it. All right. Cool. Um, you, you already mentioned, uh, Neo4j Aura. Um, and as far as I know, we're, we're cloud podcast, but we're also Kubernetes podcast. As far as I know, Neo4j Aura internally uses Kubernetes, right? Yes. As far as I know, yup. As far as I, you know, okay. Yeah. So we're probably on the same level of understand. Yeah. There may be some other things I do as well, but, but yes, we run, we run Kubernetes and we have a very good, uh, integration and, and partnership there. Okay. So that means I can also use Neo4j on Kubernetes outside of all right. Yeah. Yeah. The thing that, um, at least I didn't realize until I, until I started digging in just a little bit is running a database on Kubernetes is not a simple, like spin up X database. Um, there's a lot of, you know, because if you don't care for persistence, yes. Right. Um, Kubernetes is very customized because typically you're dealing with enterprise systems and you need to, to mess or customize with individual components or pieces. So running Neo4j requires, um, about four or five different components, um, that technically run or would run separately on Kubernetes. And so, um, if you've ever heard of helm and helm charts, um, that's the easiest way to basically just outlines, you know, these are the services, the pieces that I need in order to run Neo4j, um, spin all these up together and manage them this way and, and replicate them this way. Um, and so it's actually pretty easy to get up and running with the, the Neo4j provided managed, uh, supported, uh, helm chart. Interesting. So the, um, the reason I'm saying interesting is because everyone these days talks about Kubernetes operators and we have the operator to set it up for you. And you say, no, he's for home chart. It is like, it's, it's so refreshing. I haven't heard that in a while. I think the reason is that operators give you a lot more like operational, um, um, um, um, well, you can react at runtime to certain situations where it is the helm chart is basically just be the installation. I think that is the reason why a lot of people use or move towards the operator. Um, but, um, that's just my guess. Um, maybe it's just like cool to have an operator these days. Yeah, the latest, the latest thing. Yeah. Yeah. Um, so let me see, uh, we, we, we talked about developers. Uh, we talked about the programming languages. Uh, we know you can run it on Kubernetes. Um, make sure you have a persistent volume. If you run a database, we talked about that. Yeah. And if you need a persistent volume provider, I, I heard that simply about might have something for you. Um, but there's, there's a lot of others as well. Um, actually, um, just yesterday, um, I, uh, or on the weekend, um, I started, uh, a small website where you can, um, look for all the different CSI providers, basically the, the, um, volume providers that it can be plugged in into, um, Kubernetes, everything that I know and found. Um, and I, uh, split them by features and you can search. So, uh, if, if, if you're in, in the search for, for a CSI provider, um, storage class.info is probably what you want to look into. Uh, if you find something that is wrong, feel free to send a pull request. Uh, it's, it's GitHub pages. Um, just, just like, um, as a site note, um, okay, um, because we're pretty much out of time. Um, what do you think is the next big thing in cloud, in graph database, in databases in general, in AI, feel free to name two of three things as well. Yeah. Well, I think, you know, AI is kind of, um, it's kind of the big thing right now, but I think we'll start seeing that. Um, not necessarily taper off, but we'll start seeing that integrate into, um, kind of just our standard day to day rather than that, I think being the focus for everything. Um, I think we'll kind of see, you know, us. Not go back to, but, um, but kind of modify what was our workflow to integrate LLM's and gen AI stuff into, into our day to day things. Um, and so it will become just a piece of the deployment puzzle or, you know, building the puzzle or application puzzle, whatever it is. Um, and so I think that will kind of get standardized a little bit better. We'll kind of figure out where the, the super useful applications are and the highly critical and impactful workflows that we need to use it. Um, and so I think databases, um, are going to be a huge component of that. Um, whether it's, you know, graph or, or something else entirely, um, we're seeing this, this shift from, okay, use LLM for everything, realizing that LLM has some limitations, right? And some, and some weaknesses, but I think those are weaknesses and limitations that databases can really help mitigate. They're not going to completely solve them, but they can help mitigate that. Um, because we have lots of good data in our data structures already. Um, and so pairing the two, I think together, this is where you see that retrieval augmented generation or rag concept. Um, pairing the database with an LLM, I think is going to continue to improve that story together. Uh, true, true. Um, you said how to, to use it best or where to use it. Um, the, I mean, uh, right now there is this meme going around, like, um, I want my LLM to do my dishes and, um, I don't, don't know whatever. Well, so it was, it was differently. I, I don't want my, uh, my, my AI to do art and whatever. I wanted to do it, but dishes. Yep. Um, so I can't engage, but yeah, I want to mitigate the low, or delegate the low impact things to the LLM exactly. Um, I don't, unfortunately, I can't remember exactly what, what it was for right now. Um, but if I find it, I'll, I'll put it in a show note. Um, I read that. I was like, yes, that is exactly it. Why do we give the complicated tasks or the stuff that we love to do to an AI instead of trying to offload the stuff we really don't like? Yeah. Um, a good example of that would probably be a writing of the initial documentation for stuff. Um, looking at the source code, come, um, at the comments and coming up with an initial draft for the documentation of that, whatever. Um, I mean, most of us are engineers and engineers love one thing, uh, which is writing code, but they love, hate the other thing, which is, well, laugh, hate the other thing, which is documentation. Yeah. Um, so maybe, maybe that is something where we should look into and figure out if maybe, maybe it helps us that way. All right. Um, cool. Yeah. Um, I was, that was a pleasure. Thank you very much for being here. Thank you so much for having me. Um, my pleasure. Yes. And, and for the audience, Jennifer, prepare a demo, uh, which unfortunately doesn't work for an audio podcast, but we'll put it in the show note. Uh, it will show you exactly like, uh, you can set up, uh, New York Virginia on Kubernetes, uh, yourself. Um, and we may actually do a recording, um, so I can put that as well. Uh, we'll, we'll, we'll see. Maybe not yet. Maybe it's somewhere in the near future. Um, like a plan. Yeah, I know. I know. Sometimes, sometimes I have plans, not a lot of times, but some time, whether they actually get implemented, you know, who knows exactly, you can always have good ideas. And there's plenty of those, not all of them are getting implemented. All right. Yeah. Uh, as I said, thank you very much. Uh, it was a pleasure. I was good to talk to you after two years, three years again. Um, yeah, something like that. Yeah. Time just flies, um, and in person at a conference some time in the future again. I, I, I hope so. I hope so. Um, I mean, uh, there is a lot of database conferences and a lot of Java conferences. So there's a good chance, I guess. Yeah. All right. And, and for the audience, uh, thank you very much for being here again. Uh, see you all next week, uh, with the next episode and the next guest. Thank you very much for being here. The cloud commute podcast is sponsored by Simply Block. Your own elastic block storage engine for the cloud. Get higher IOPS and low predictable latency while bringing down your total cost of ownership. WWW Simply Block IO.