Regular Programming

About Non-CRUD

CRUD - a classic term among supposedly simple web apps. But, not always the right move? Not always all that mappable to the actual problem?

Discussed: picking spicy architectures, non-CRUD data storage needs, slovely solutions, dirty refunds, and doing the OAuth dance.

Hey, thing happened!

Finally: a story where pubsub was reasonable, and some telemetry.

Links

Duration:: 29m
Broadcast on:: 08 Jul 2024
Audio Format:: mp3

CRUD - a classic term among supposedly simple web apps. But, not always the right move? Not always all that mappable to the actual problem?

Discussed: picking spicy architectures, non-CRUD data storage needs, slovely solutions, dirty refunds, and doing the OAuth dance.

Hey, thing happened!

Finally: a story where pubsub was reasonable, and some telemetry.

Links

[background noise] Create, read, update, delete, and ding from my phone. Ding is the important part, right? Create, read, update, ding. That's what I've heard. Now, crud, or create, read, update, delete is a term I hear fairly frequently when talking about kind of supposedly simple web apps. Are you familiar with crud? I'm familiar with crud. It's, what's, what are you up to, my friend? What? Since you're asking me, what's, what's the, it's like, it's one of those, you know, there's something, I did nothing, no. Yeah, so I think there's a bunch of systems where crud makes less and less sense, and I think that's something you kind of discover as you work, and for some system, it makes perfect sense. So, it is a particular shape of dealing with data, and it's kind of row, I would say it's row oriented in the row versus column oriented, sort of axis, and it is kind of document oriented because if you look at all the crud oriented frameworks, like Django, Ruby on Rails, Phoenix has a bunch of this as well, even more if you use Ash, they actually usually get a little bit complicated when you start dealing with relational data. So it's not really well mapped to like a relational database, but usually people do it anyway. But I guess it's the like, oh, there are a few operations you do, you create things, you read them, you update them, and you delete them. Usually there, the read implies that there is a list as well, crud, curled, I don't know, but I recently, I guess, run into a bunch of different cases where crud is kind of the wrong move or the move that won't scale to the intended level. I know you've worked on a non crud system for a bit, not sure how much of that needed to mean non crud, but I can see why they went for non crud because you had a transport system that wanted to show a bunch of near real-time states. It's also kind of fascinating before I started, maybe even before you started working on that system, it was mainly there to write data, wasn't really meant for reading data. So as you just wrote some data to it and it sent the data to all places where people needed the information and then you forgot about it. You also put it in the rethink dbdb, but then forgot about it. That was all right. And then we started to want to get information out of it and all kind of hell broke loose. So it's quite an interesting system, but also they didn't have to write it that way. They didn't have to architect it that way. Yeah, I think some of what they did, which was, so the first step was, like if you wanted to write some data, I believe you broadcast a new update. Yeah, pub sub dot broadcast entity. Yeah. And the first step of the broadcast was to write it to the database through a cache, through an in-memory cache, because the cache distributed in memory cache. Yes, I was just going to say that we knew that the cache had some guarantees. This is, of course, a lie through and through, because the cache used the amnesia dirty writes. Yes, all the dirty functions, the dirty prefix functions, because performance, yeah. And so they did a, the first thing it did was a dirty write, then a dirty read to get the data back again. So you just have to just hope. What's the deal? Do you have dirty read of the dirty write consistency in amnesia? Exactly. What's the semantics of that? It's we found out when we put lots of load on the system that we don't have dirty read of the dirty write, I so need this, the semantics of dirty read, dirty write. But there's no consistency there. It's just luck and good vibes. So when there's low loads and good vibes, you get this stuffed back. You expect to get back when there's high load, it starts to just fall apart in hilarious ways. But after that, the, the just read dirty data was written to a database that was slower and had even less consistency guarantees. It probably had more, but we never kind of really found them. So. Yeah. Yeah. And that's an unusual system and it's not typical and like I see why they made some of the choices. Yeah, I can, yeah, I can empathize with them, but I don't like them. Not anymore. I still have scars here. Look. Yeah. You have some grievances. Absolutely. I was a much more, I was much happier about this a couple of years ago. I could really see why they did what they did, but now I'm just old and grumpy and go, no, what's, what's wrong with request to response transactions or postgres. Yeah. Yeah. And I, I think a lot of people try a spicy architecture and they go, oops, this didn't turn out so well. We should have just used postgres. And I think in many cases, that's like a good wisdom or a good takeaway. But then there's, if you end up doing, for example, analytics, so this was something I was confused by when I looked into how plausible analytics they're open. They are or were open source built on a lecture and what they use for analytics. And I was like, hmm, I assume they used like postgres and maybe postgres with maybe time scale to be in it or something. But no, they used to click cows and I was like, why you use a different database I've never heard of. This is upsetting. And then I read up on it and it's like, okay, so it's a column oriented analytics database. Well, I guess I assume that's faster somehow. And that was very much the case. Like they'd had, I think they might even have started on postgres, but had had had problems that made them switch, but or they used postgres for a bunch of like storing customers and users and all of that, but click house was the workhorse for their analytics workload. And when I was at Nervesconf this year, buddy, Alex McLean, talked about like building IoT products with nerves and he talked about like data storage and how the impulse, if you come from the webs, and it's like, okay, I guess we'll use SQLite. SQLite is a great database. But for embedded, it's maybe not where you should start. So one reason is you have a file system and you can just put things there. So for some things, you can just use a file. And then there's like, if you step up or two, you can use these are, I don't know if these are kind of steps on a ladder. They're probably kind of parallel and slightly different nerves ship, something called the property table, which I believe is ETS or the ETS under the hood. But what it does is also you can subscribe to changes, but it's a key value store with tables, which is practical. Nice. Sounds very useful. Actually, it might be an arbitrary depth path key kind of deal, or it's just our long terms. I don't actually, I haven't dug into property table a lot. I've used it a little bit at work, but fundamentally it's just like you can put values in there and get values from it and you can find out if they change by message passing, which is a very, very useful pattern inside of a device. It's like, oh, we changed the sleep state. There are a few different things that should probably change their entire vibe if the sleep state changes, like maybe we should switch an LED, maybe we should dim a screen, maybe we should put on the pause music, like it could be on them, all sorts of things. But it's a single little value, and it would be weird to have it in a Postgres table or a SQLite table. Then you would have some table called keys and values or something, settings, states, or some other nonsense table. Isn't this the infamous WP underscore options table? Oh, yeah. In WordPress. Yeah, it might be. It could be. Yeah. Everything. Everything, especially encoded is JSON or encoded as like the PHP version of pickle. It's just serialize and deserialize, I think. Yeah. Yes, absolutely. Go for that all day. But so let's not put that on a device. And then there's cub DB, which is just a pretty cool key value store built in elixir that lives in a file. It's built on like the rocks DB thing, which I think came from the dynamo DB paper or the dynamo paper. Is that the source version of dynamo DB? Yes, kind of probably. By some completely different people. Yeah. Yeah. Yeah. It's essentially like a key value store that people like building databases on top of. Yeah. That's just one of those for built in elixir. So that's an option. That can be nice. You get a little bit more feature set compared to like just straight files, for example. And then there's SQLite. So if you need schema migrations and you need, and you want to deal with schema migrations, if you want querying, you want SQL or everything that Ecto offers, you can use parts of Ecto without a relational DB, but yeah, then then you could pick a SQLite for sure. There's nothing particularly wrong about that. But then there's so there's also a bunch of cases where you might just want a bunch of ephemeral state. So I think that was part of what they were going for with the system you've worked on that fundamentally the current state of our ride across town is not important to anyone after now. Indeed. It's only important. Yeah. And like it's a transient state and for reasons related to kind of analysis and data and being able to make smarter choices, it would be interesting to have the history of states that have been passed. So you could do analysis and yada yada, but the current sort of window of the world, the current window function is not that interesting to store as long as you can recalculate it or get it, get a new one very quickly. So if you if you lost the location of this ride, as long as you know, there's a ride in progress and someone will like this driver's app will tell the back end that it's happening still, then that would be would probably be fine. So sometimes you have kind of a ephemeral data and in I think in embedded systems, especially so because there's always like, Oh, we read this from a sensor or currently we can't connect to this thing or we got a bunch of messages about this and we're going to make a decision off of it. But it's not something we will persist. There's nothing to store or this is information we want to keep in memory right now, but it's information we would rather forget on restart. That's fairly common in embedded devices. And that's very non crud generally is you don't create a new entity and you don't update a specific entity, you just in many cases, it's like it's a global key value store that's desirable or a bunch of them. And then there's like, yeah, then the analytics thing where it's like, Oh, the most important operations we do are calculations across a lot of these numbers. So we could optimize the entire DB for making calculations across many of these numbers instead of dealing with many records. And that's kind of where you end up if you're doing sort of more data engineering type stuff. You see plenty of row oriented pieces of data and like a lot of records where you do data as well. But I think most of them, most of the processing you want to do aside from kind of formatting and typing things up and schemas and stuff is probably akin to doing analysis of the data. How many, what's the average, what's the standard deviation, what's the P9 or not? And then a column store is much more efficient. Usually. Have you run into any other crud less systems that spring to mind? Yeah, I've been working on some batch processing systems. They don't do much crud. They put a jamble in one end, it spits out all kinds of J's and jamble and configuration files and logs, stuff. That's a, I really like the batch idea because there's lots of margin in it. It just runs once. If it's possible to make sure that it's only running one at a time, there's not much that can go wrong. So that's very non-crud. I don't know if it's, does it count? I just heard the sentence, not much that can go wrong in batch file. And yamble. That's yamble. Yeah, sure. Okay. But, but the idea, if we ignore what we said and listen to what I meant is that the batch. You like batches. Yeah, idea. You have to do things. Yeah. Yeah. That's, it just, it just goes. And there are also other patterns associated with it. That's nice. Like put a director, an in-box directory somewhere on a file system, you either have, I notify thing or Linux that shakes for changes in that directory or run it once a minute or something and take care of all the files that are put there. That's also quite lovely. It's slow, but lovely. It's lovely. It's lovely. Yes. But that's, yeah, that could be crud, you know, it's, let's see, upload your, an order for socks and it goes through the CSV file and creates things in the database, order items or something like that. So that's, it's very transaction come to think of it. The tricky part about like orders and stuff is that you always end up talking to other systems. Yeah. So you always end up with what's actually distributed transactions, which you probably don't enjoy. It's like, okay, but so we could always change this value back if this thing fails locally. But we've already dispatched like the label maker order to this label maker API or whatever. Can we unorder a label? Can we roll back a request for a label? So in what order does it make sense and where could it fail and so on? What are the semantics of actually we don't want to charge the customer anymore after charging? We send out a flying pig with a small sack of gold coins. What are the dirty refund after dirty charge semantics? Yeah, they're dirty. Dirty money all over the place. Yeah. But I think you made a pretty, pretty good t-shirt for like an airline conference. Yeah, I know, right? Amnesia. And then do we have dirty read after dirty write consistency? I might be thrown out of the community after that. Only one way to find out. I think you'll definitely run into people that would be concerned in one way or another. Yeah. Are you okay? Did you build systems for anyone with this? Yeah, I did remove all the dirty calls and then everything started to work in a more stable but slower fashion. Was it noticeably slower? I assume it would benchmark slower, but that doesn't necessarily matter. Yeah, we did some because we had to wait a year for the next spike, because we're very seasonal, but we've tried it by sending a huge amount of requests against it and behaved. And on the other hand, there was lots of the system that could just fall over. So we don't really know. It was a hard system to benchmark or just understand because everything affected everything because of this broken Pub/Sub pattern. So have you ever encountered a Pub/Sub that was reasonable? Sure. I think Pub/Sub is a very useful messaging pattern. I use it all the time. Like Phoenix Pub/Sub, it's just a matter of like, okay, things that need the information can subscribe to the channel and get updates when new information is there. But like I've used it for kind of fan out workers, I've used it for a decent number of things. And usually it's not a problem. This one was a bit weird in that it really deeply commingled Pub/Sub read write, which is not necessarily bad. The ash framework actually offers Pub/Sub on writes. I think you have to opt into your particular write trigger. I don't recall the details. No. Maybe it's the fact that you write a subscribe that makes sure that a broadcast will be done or I don't recall the mechanism exactly. For example, essentially you can subscribe to any action taken in ash and ash is kind of a crud framework, but it tries not to do, tries not to focus on crud, I think. And I think that's to its advantage. So by default, it will offer you like you can just enable a create, read, update and delete action for your resource, but you are kind of encouraged to make custom actions for the specific things you do to a resource. So register would be a reasonable, well, for let's say for conference booking, rather than create a booking, you might register for your conference or register a booking. Which would be a create. It is of the type create because it's not changing an existing one, it's making a new one. But it's named in some reasonable way and there might be other creates that do it differently because it might be, oh, this is something for a back-end booking agent and they will provide more information up front or less information or this is for the volunteer. So you can expose different APIs in different directions, essentially. And Ash offers a bunch of nice libraries for off. So I was dealing with the Google off libraries and I was doing some chat integration. So when someone went in the chat and tried to talk to the both, I needed to do an OAuth dance with them. So I could tell them, oh, here, go do this OAuth dance, here's the link. They click through the link, they do the OAuth dance. Now I need to pick up the fact that their account was created because if the account wasn't created, or if, oh, right, if they did, I didn't pick up whether the account was created. I picked up whether the account created a logged in session. And I could subscribe to the Ash event for that particular resource being created in a particular action. So I could just essentially subscribe to someone logged in. And then specifically, I want to subscribe to this person logging in for just this period of time. So this was a little bit of a stateful thing because I offered them a login and they usually asked. So if they asked this chatbot for something and they weren't logged in, the chatbot could go, sure, but you have to log in first. And then they, otherwise they couldn't use their API calls and stuff. So they go through that and then the bot can go, oh, hey, now you're off. I can proceed with what you told me originally. And that was a very short published subscribe, but it was a pub sub. And it was mostly the fact that it's like you can ask for messages. It's in decoupling of like, I want messages and the part that sends the messages. So in many cases, when I worked with Elixir, like you do a lot of message passing and then Pub/Sub becomes a very important pattern because sometimes you don't want to have one pit or a list of pits. You just want to go, hey, thing happened to whomever it may concern. And then you've suddenly decoupled a part of your system. Sometimes that's the right move and sometimes that's the wrong move because of course, any time you introduce Pub/Sub, you're introducing queues. Yes, but they're there anyway. Well, when you introduce message passing, you introduce queues. Yes. And that's why you might not want message passing sometimes. The logger is an interesting example in Elixir where log handlers are synchronously called functions. They're kind of interesting because when you add a log handler to logger in Elixir, do you know how that works? Do they work the same way as telemetry? Oh, yeah. Oh, no, it's me screwing up. Yes, telemetry was what I meant. The telemetry. Let's restart. So do you know how telemetry works and telemetry re-handled telemetry handlers? Well, no, I will talk about them even if you know what, how they work, but yeah, since you can turn like an Erlang term can be a function and you can put any term in ETS, in ETS is very efficient for concurrent access. When you register a handler, they just take the function and they shove it into ETS. And it's like, when you get this telemetry, call this function. And those handlers are called just one after another completely in line with whatever code is making the call to telemetry, which means you should do as little work as possible. You probably do something message passy or something ASAP. But if you have a ton of telemetry events and you generate a ton of messages, you can have all sorts of bottlenecks and problems. So if telemetry had the opinion that, oh, this is just message passing, it could not be fixed for kind of aggressive cases and it would be extremely limiting. But now it's just, now is that it's just kind of up to you to make sure your handler is nice and snappy. Yeah, isn't telemetry working the other way around compared to the logging framework? Or have I, is this just a dream from somewhere? Yes, I thought that the logging framework does a lot of message passing just to get rid of this taking up time and space from the thread doing the logging. So whenever you're doing IO, are you thinking of when you're actually printing the logs? Because then you're dealing with group leader and that's kind of an IO subsystem. And I think that's pretty message passing driven probably. Yeah, maybe it's just me who has created a headcanon of the people who did the telemetry stuff, looking at logging and going, let's not do that. So well, looks there's logger was made before the telemetry implementation. So, yeah, so someone could have got an inspiration from the pain of working with logger when oh, yeah, lots of yeah, it could be related. I think also telemetry is expected to put out more stuff than logger. But I don't know if they've done anything to logger to to have similar things because it's definitely a similar problem area. Yeah, I know some cool things have happened to logger lately, like the last three years, but I don't remember what. So that's it's great to have that internet brain. Yeah, I can Google for it. Sure. Because I've only saved what to Google for, not the actual information. Cool changes logger last three years. Yeah, Alexia Erlang. Go, go. Go. Go. the.