Imagine starting a Friday morning, expecting a relaxing day off as you prepare for vacation, only to be thrown into chaos instead. Suddenly, your phone is blowing up with a flood of frantic messages, and while you attempt to log into your Microsoft email, it’s mysteriously down. You try to brush it off, thinking, “What can I do – I’m on vacation anyway?” And you decide to take a break from the mess. Anticipating your flight in a few hours, you go ahead and order a coffee for a sweet treat to get your mind off the mayhem of the morning. But when the mobile ordering system isn’t working, technology fails you yet again. As you arrive at the airport, you’re greeted with even more disorder and the shocking news that your flight was canceled, and they can’t reschedule you on a new one. You might just consider it a streak of bad luck, but what if it was more than just a coincidence? What if all this havoc was caused by a small bug in a CrowdStrike update? This was a reality many experienced this summer in what Australian web security consultant and expert Troy Hunt commented as, “the largest IT outage in history.”
What happened?
Founded in 2011, CrowdStrike is a multi-billion-dollar cybersecurity technology company based in Austin, Texas. While they have played a role in investigating high profile cyberattacks like the 2014 Sony Pictures hack and the 2016 Russian attacks on the DNC, on July 19, 2024, they found themselves experiencing a predicament of their own. One with a huge impact.
According to CrowdStrike’s “Preliminary Post Incident Review”, a routine content configuration update for the CrowdStrike Windows sensor inadvertently caused a system crash, leading to widespread disruptions. This update was aimed to enhance telemetry on emerging threat techniques that were part of the Falcon platform’s dynamic protection mechanisms. The Falcon platform is a cloud-based endpoint protection platform that combines threat hunting, endpoint detection and response (EDR), and antivirus services. An undetected error in the Rapid Response Content update, however, triggered the issue, impacting Windows hosts running sensor version 7.11 and above.
What was the impact of the outage?
Given that CrowdStrike is one of the largest cybersecurity companies, the impact of the outage was substantial. The disruption affected a wide range of users and systems, leading to significant challenges. For example, over 3,000 flights within or out of the United States were cancelled and over 11,000 flights were delayed.
The American Hospital Association also described how hospitals experienced varied impacts, ranging from minimal disruption to significant issues affecting medical technology, communications, and emergency services.
CrowdStrike’s Response and the Importance of Transparency
Despite the concerns and reservations, which led to a roughly 22 percent drop in its stock, CrowdStrike responded immediately and effectively by maintaining transparency and clear communication throughout the incident. On the same day of the outage, CrowdStrike’s founder and CEO, George Kurtz, wrote a blog post that addressed the outage affecting the Falcon content update for Windows’ hosts. He clarified that the issue was not a cyberattack and that the issue was quickly identified and fixed. We also spoke with James McGregor, U.S. GTM Specialist – Cisco Security at TD SYNNEX, and he also described the incident as, “more of a networking thing than a security issue.”
In times of crisis, transparency is crucial for maintaining trust and confidence among customers and partners. By openly addressing the problem and providing updates, CrowdStrike demonstrated a commitment to honesty and accountability. Kurtz emphasized, “Nothing is more important to me than the trust and confidence that our customers and partners have put into CrowdStrike.” This commitment underscores the importance of transparency in reinforcing trust and loyalty. As McGregor highlighted, “Transparency is crucial in these situations. When a vulnerability is discovered, timely and clear communication is essential. CrowdStrike did a perfect job demonstrating this by promptly addressing the issue and avoiding attempts to hide it. This openness not only helps manage the impact but also reinforces trust and shows a commitment to resolving the problem effectively.”
Even though the incident was undoubtedly significant, it’s important to understand that such an event can occur with any vendor. Instead of jumping ship, partners should focus on how vendors respond and learn from the incident. CrowdStrike’s dedication to accountability and partner trust was demonstrated by its prompt and transparent response.
Recommendations and Next Steps
For partners who are looking to provide an additional layer of security and comfort to their customers, McGregor recommends Cisco XDR as a great addition to your toolkit. McGregor said, “Cisco XDR is an extended detection response platform, and it acts as a single pane of glass across a customer’s entire security infrastructure.” While the CrowdStrike incident was due to a faulty update rather than a cyberattack, Cisco XDR offers significant benefits by enabling fast detection and rapid alerting, ensuring swift response in any future events.
Some benefits of Cisco XDR that make it an excellent choice are:
- Integrated Threat Detection: XDR integrates data from network, endpoint, cloud, ad email security, providing a view that enhances threat detection across the board
- AI driven Efficiency: XDR leverages AI to prioritize and respond quickly to its threats, reducing manual effort and improving incident handling
- Flexible Integration and Support: XDR seamlessly integrates with various security tools and third parties, including CrowdStrike
This incident has highlighted the need for disaster recovery and backup services. In today’s unpredictable digital landscape, disaster recovery should be a crucial component of any company’s strategy, allowing them to keep a resilient outlook and be ready for an unforeseen circumstance.
Key Takeaways from the Outage
The recent CrowdStrike outage is a reminder of the vulnerability of our systems. What was supposed to be a routine update led to a widespread disruption. Despite initial fears, we know that the chaos stemmed from a glitch rather than malicious intent. CrowdStrike’s transparency and responsiveness during this time helped mitigate the concerns and maintain trust.
One of the main takeaways from this incident is the importance of timing when implementing updates. As McGregor pointed out, “CrowdStrike broke the cardinal rule of system updating, which is don’t push updates on Friday afternoons, especially on Friday mornings… If there’s a way to do that either at night or over the weekend.”
Author
-
The EDGE360 editorial team consists of Jackie Davis, Katherine Samiljan, and Jessica Nguyen. You can reach the team at EDGE360@gotostrategic.com.