Overview
In this post, we're going to look into the CrowdStrike outage that happened early last Friday morning.
As you were waking up and getting ready to start your day on Friday, July 19th, the world of IT was already in a frenzy. Broadcast channels were down, over 17,000 flights canceled, and banks were not able to operate. You may ask what could cause such a thing... an update. An update.
Who is CrowdStrike and What Happened?
CrowdStrike is a cybersecurity technology company that provides endpoint security, threat intelligence, and cyberattack response services.
On Friday, July 19th, roughly around 3:00 AM, a content update was pushed out by CrowdStrike for one of their endpoint protection software applications. These endpoint protection software applications are meant to protect your PC from viruses, malware, and basically anything malicious. Most are considered next-generation antiviruses. The goal is to stay one step ahead of any hacker or criminal. To do this, the software is always updated to detect any kind of malicious activity.
Updates like this have been pushed out for years without issue. This is the way it should go; this is every IT professional's dream: to push out an update, make a configuration change, and have everything go as planned. But sometimes, you have to expect the unexpected. And this day was Friday.
The update that was pushed out Friday had a software bug that caused an issue with Microsoft operating systems. This bug made the PCs display the Blue Screen of Death.
If you don't know, the Blue Screen of Death is the last screen you want to see when you reboot your PC. This screen usually means you have an issue. This issue affected a lot of CrowdStrike's customers, and being one of the top companies, they had a lot of customers.
In one article I saw, it was stated that a hospital was one of the first to report the issue. Then, as we got into the morning, this issue had affected the majority of the world. As of this recording, the issue is still going on, with most of the issues being resolved. One workaround was to boot the PC in safe mode, then delete the file that was updated.
This meant a long day if you are in IT support. With a bluescreen on your PC, support can't remote into it, which means they have to physically do the steps to implement the fix. This can take a lot of time depending on the size of the company you support.
My Opinion
When something like this happens, I think every IT professional can feel their pain. Like I stated before, most changes are planned, and the engineer has maybe tested or should have tested the change before deploying it. Even if that is done, it's still possible to have issues.
One thing that stood out to me was why push an update on a Friday? It's an unwritten rule in IT that no changes are made on Friday. That's because you don't want to make a change then get woken up on Saturday morning to issues that are going on because of your change or update.
Or, if you have to push an update on Friday, why not section it out to maybe only a few clients, and if everything goes well, send it to the others on another day. Other than that, I think they handled it the best that they could.
My heart goes out to all the IT support folks who were tasked with touching each PC on their network. I went through that process before, and it's nothing I would want to do again.