Archive.fm

Augment Stay Human

The Crowdstrike Incident: A developer mistake explained in 3 minutes

Duration:
3m
Broadcast on:
21 Jul 2024
Audio Format:
mp3

But by 5.27 a.m., just an hour and 18 minutes later, CrowdStrike had already implemented and deployed the fix to the problem. They realized it wasn't an attack at all. It was simply a mistake with the recent update. Welcome to Augment Stay Human, the show where we explore how to supercharge our human potential in the age of AI. I'm your host, Chris Littieri, also known as Bits of Chris. As a staff engineer, former day trader, and advocate of using AI as Augmented Intelligence, I'm here to help you leverage AI to think clearer, learn faster, and balance success with fulfillment. Remember, Augment Stay Human. When you have airlines, banks, hospitals, all having computer issues at the same time, that's a little scary, isn't it? It's easy to think this is an attack, but in reality, what happened on July 19th with the CrowdStrike incident was much less exciting than that. It was a simple update that wasn't properly tested, CrowdStrike makes a suite of software security products for you to install on your machine. The Falcon sensor product is CrowdStrike's vulnerability scanner. It runs on your computer and scans at the operating system level looking for certain vulnerabilities. Now, just like your phone or computer get regular updates, CrowdStrike will regularly push updates to its Falcon software. On July 19th, they pushed one out, that happened to have a mistake in the code, which would cause certain machines running of their windows to crash. Based on CrowdStrike's incident report, at 4.09am UTC, they pushed this regularly scheduled update. But by 5.27am, just an hour and 18 minutes later, CrowdStrike had already implemented and deployed the fix to the problem. They realized it wasn't an attack at all, it was simply a mistake with the recent update they had pushed all their customers. This update did not break any devices running Mac or Linux, it just happened to affect Windows machines. Think of it like this. Imagine you're baking cookies for a neighborhood bake sale. But instead of using sugar, you accidentally use salt and you don't realize it until you drop the cookies off of the bake sale on our back home. So some people have already taken the bad cookies and now have to eat it and spit it out. That's what happened with CrowdStrike's update. They pushed the update to all their client's computers and once those computers downloaded that update, they begin to crash. And any time those systems tried to reboot, they would crash again because that update was still persisting. So after the fix was applied by CrowdStrike, it requires somebody to manually go and restart the machine. For Windows, that meant you had to restart it in safe mode and apply the update. And that's why it took so long for so many systems to come back online. So while it was scary that a lot of systems were going out at the same time, all that really happened was a bad update by CrowdStrike was pushed to clients running the Falcon software. That caused Windows machines to crash. The reason it took so long to come back online was that it required a manual step on every machine. Tech companies make mistakes all the time, but this one was scary because it happened so fast and it happened to so many computers at once. And it required manual intervention to fix. One small error in a tiny update impacted millions of computers and had real world effects on people. Good news is that tech companies learn from these mistakes. We're constantly improving our processes to prevent things like this from happening again. And so what can we take away from this? Well, when you hear of a major tech incident, just realize these kinds of mistakes happen. And the ripple effects they have are just a function of how reliant and how interconnected we are with technology. You don't need to jump to the conclusion that it's a hack. It could just be human error and it likely is. And while these incidents can be scary and disruptive, they do push companies to be better. Tech companies have a financial incentive to not let things like these happen again.