Notes: CrowdStrike - A Dive Into The Impact

Notes: CrowdStrike - A Dive Into The Impact
An image depicting digital chaos - the Windows system crash caused by CrowdStrike

Summary

  • CrowdStrike's software update caused a widespread system crash, impacting critical infrastructure globally.
  • Incident stemmed from a configuration update for Falcon EDR sensor agent, not related to its NGAV.
  • The core issue going forward is that CrowdStrike knows what has caused the crash but does not know why or how this standard and regular process has malfunctioned.
  • In this Note we discuss the possible implications to CrowdStrike's business - both the EDR and non-EDR divisions.

Vendors discussed: CRWD, S, PANW, MSFT, AMZN, GOOGL

In recent days, the cybersecurity world has been shaken by an unexpected event involving CrowdStrike, a leading provider of cloud-based security solutions. This incident has sparked extensive discussions and analyses across various platforms, including mainstream media, cybersecurity forums, and financial circles.

What happened?

CrowdStrike's software update triggered a widespread system crash, resulting in the infamous Blue Screen of Death (BSOD) for Windows machines globally. The impact was significant due to CrowdStrike's extensive market presence, affecting corporate PCs and servers alike. Consequently, critical infrastructure such as hospitals, airports, train stations, and numerous consumer-facing IT systems experienced downtime.

There are some interesting details not highlighted by others and/or somewhat misinterpreted. Here we list a few of such details:

  1. Root Cause: The crash stemmed from a configuration update for CrowdStrike's Falcon EDR (Endpoint Detection and Response) sensor agent. Contrary to common assumptions, it was not related to Next-Generation Antivirus (NGAV) or traditional antivirus updates.
  2. User Mode Crash: The incident occurred in user mode, not at the kernel level. This distinction is crucial, as kernel-level issues typically cause more severe disruptions.
  3. Update Frequency: The problematic update was part of CrowdStrike's routine "channel file" updates, which occur multiple times daily to enhance behavioral protection mechanisms. This process has been integral to CrowdStrike's operations since its inception.
  4. Specific Vulnerability: The crash was triggered by a logic error in Channel File 291, which is exclusive to Windows systems. MacOS and Linux agents, which don't utilize this particular channel file, were unaffected.

Technical details:

Configuration File Primer

The configuration files mentioned above are referred to as “Channel Files” and are part of the behavioral protection mechanisms used by the Falcon sensor. Updates to Channel Files are a normal part of the sensor’s operation and occur several times a day in response to novel tactics, techniques, and procedures discovered by CrowdStrike. This is not a new process; the architecture has been in place since Falcon’s inception.

Technical Details

On Windows systems, Channel Files reside in the following directory:

C:\Windows\System32\drivers\CrowdStrike\

and have a file name that starts with “C-”. Each channel file is assigned a number as a unique identifier. The impacted Channel File in this event is 291 and will have a filename that starts with “C-00000291-” and ends with a .sys extension. Although Channel Files end with the SYS extension, they are not kernel drivers.

Channel File 291 controls how Falcon evaluates named pipe1 execution on Windows systems. Named pipes are used for normal, interprocess or intersystem communication in Windows.

The update that occurred at 04:09 UTC was designed to target newly observed, malicious named pipes being used by common C2 frameworks in cyberattacks. The configuration update triggered a logic error that resulted in an operating system crash.

Systems running Linux or macOS do not use Channel File 291 and were not impacted.

Technical Details: Falcon Update for Windows Hosts | CrowdStrike

What's the fix?

The fix typically requires IT administrators to manually enter safe mode, delete the updated channel file, and then reboot the system. However, for devices with BitLocker enabled, the process becomes more complex. BitLocker is usually activated by default on enterprise devices to encrypt data. This encryption is crucial because, without it, if a device were physically stolen, a malicious actor could potentially access the data by removing the hard drive and connecting it to another computer.

For BitLocker-enabled devices, IT administrators must follow a more intricate procedure: boot into BIOS, modify RAID configurations, return to Windows boot process, manually input the BitLocker recovery key, enter safe mode, and finally delete the problematic update file.

While this process may not seem overly complicated to a technically proficient individual, it's unrealistic for IT administrators to expect every employee to perform these steps independently. Consequently, this situation is likely to overwhelm most enterprise IT departments in a short timeframe. The procedure, involving manual entry of BitLocker keys, multiple reboots, and navigating safe mode, is labor-intensive and time-consuming, requiring significant effort but not necessarily complex problem-solving skills.

While the impact on employee PCs is significant, the consequences for Windows servers can be far more severe, as any unplanned disruption to normal operations can be critical. Similar to employee PCs, these servers should theoretically be able to return to their previous state relatively quickly. However, recovering data that was in memory and not saved presents a major challenge. For instance, consider an airline booking system that processed a customer transaction moments before the disruption - this could result in missing records. Furthermore, servers often have numerous dependencies and states that require developers to meticulously recover, a process that could potentially take weeks. This is not to mention the substantial costs associated with system downtime, which can lead to scenarios as serious as airport closures or hospitals having to postpone surgeries. The ripple effects of such server disruptions can be far-reaching and long-lasting.

Initially, I thought this was just a minor event because most CrowdStrike Windows agents are used for employee PCs, and most enterprises use Linux for production servers. However, many legacy enterprises that developed their applications before ~2010 predominantly used Windows. It's no surprise, then, that we're seeing widespread BSODs in transportation, healthcare, retail, government, and other industries that are slow to evolve and rely on numerous legacy applications running on Windows. This can further complicate the recovery process because these legacy applications are often poorly documented, lack resilience, and aren't fault-tolerant. Consequently, a reboot isn't simply just a reboot in many cases. This is where CrowdStrike will likely face numerous lawsuits and reputation damage, potentially impacting its SIEM and CNAPP expansion efforts.

The company, which acknowledged "reports of [Blue Screens of Death] on Windows hosts," further said it has identified the issue and a fix has been deployed for its Falcon Sensor product, urging customers to refer to the support portal for the latest updates.

For systems that have been already impacted by the problem, the mitigation instructions are listed below -

Boot Windows in Safe Mode or Windows Recovery EnvironmentNavigate to the C:\Windows\System32\drivers\CrowdStrike directoryFind the file named "C-00000291*.sys" and delete itRestart the computer or server normally

It's worth noting that the outage has also impacted Google Cloud Compute Engine, causing Windows virtual machines using CrowdStrike's csagent.sys to crash and go into an unexpected reboot state.

"After having automatically received a defective patch from CrowdStrike, Windows VMs crash and will not be able to reboot," it said. "Windows VMs that are currently up and running should no longer be impacted."

Microsoft Azure has also posted a similar update, stating it "received reports of successful recovery from some customers attempting multiple Virtual Machine restart operations on affected Virtual Machines" and that "several reboots (as many as 15 have been reported) may be required."

Amazon Web Services (AWS), for its part, said it has taken steps to mitigate the issue for as many Windows instances, Windows Workspaces, and Appstream Applications as possible, recommending customers still affected by the issue to "take action to restore connectivity."

Security researcher Kevin Beaumont said "I have obtained the CrowdStrike driver they pushed via auto update. I don't know how it happened, but the file isn't a validly formatted driver and causes Windows to crash every time."

Faulty CrowdStrike Update Crashes Windows Systems, Impacting Businesses Worldwide

What's the impact?

The consensus view from the sell-side is that this should be a one-off event, and investors should buy the dip. CRWD has been considered a high-quality, or even the highest-quality, cybersecurity or enterprise tech name in recent quarters. This reputation stems from CRWD's ability to deliver 30%+ growth despite having an ARR near $4bn. This is a remarkable figure, especially when compared to other BoB product and GTM players with similar revenue bases, who are seeing growth rates decline to 20%+ (e.g., SNOW), resulting in multiple contractions. CRWD's strength is further evidenced by its ambitious forward revenue expectations prior to July 17th, including a FY28 revenue projection of $8.5bn, representing a 29% CAGR based on FY24 as the starting point. This is an extraordinary projection, particularly in the context of the general deceleration observed in enterprise SaaS.

The key to CRWD's ability to maintain high growth despite its substantial revenue base has been its successful expansion into adjacent markets, including cloud security, identity protection, and Next-Gen Security Incident and Event Management (NG-SIEM) since early 2023.

In recent quarters, CRWD has found it increasingly challenging to grow its core EDR agent business at 30% as it approaches market saturation, with most organizations already having adopted their solution. In response, CRWD has been acquiring companies, developing new products, and cross-selling them to existing customers. While its core endpoint security business has been growing only in the mid-teens in recent quarters, its new lines of business have been posting triple-digit growth, enabling the company to maintain a consistent 30%+ overall growth rate.

Some investors have pointed out that previous security-related incidents at other companies, such as those experienced by McAfee and Okta, did not significantly impact revenue as initially feared. This perspective suggests that CrowdStrike might similarly weather this storm without substantial long-term financial consequences.

“Our recent conversations reaffirm our view that there will likely be minimal share shifts in endpoint post this event — although we recognize that additional details in the postmortem will further inform this view, analysts led by Gabriela Borges wrote.

They pointed to a 2010 McAfee outage that caused computer crashes to give a sense of what came before last week’s events. “The revenue impact due to deferrals was about $6 million of deferred revenue not recognized from the balance sheet, and revenue was also negatively impacted by another approximately $14 million,” CEO Dave DeWalt told analysts on a conference call. Intel bought the antivirus company in 2011.CrowdStrike shares tumble as fallout from global tech outage continues

However, we can't simply use surface-level empiricism to predict the future. McAfee's 2010 outage was caused by its antivirus software mistakenly treating critical system files as viruses, which is a textbook example of how antivirus can fail (ironically, George Kurtz was McAfee's CTO back then). Many observers initially thought that this CRWD outage was a repeat of this scenario, but it is not. The current issue was caused by CRWD's core EDR component, not its Next-Generation Antivirus (NGAV), which is more of a peripheral module for CRWD's business. If it had been caused by CRWD's agent mistargeting critical system files, it would indeed be a very rudimentary mistake. However, it was actually caused by its EDR agent channel file updates, which represents a novel type of error even for CRWD. Compared to antivirus false positive outages, which are a basic issue that every vendor needs to understand and test for, this kind of error is quite new. Consequently, even CRWD couldn't provide a clear explanation for days. For issues like those that occurred with McAfee in the past, the causes and solutions are well understood, and they are recognized as basic mistakes. This CRWD incident, however, represents uncharted territory.

It's not appropriate to criticize CRWD based on the assumption that this was a basic, well-known mistake. Instead, this incident represents a novel error that even CRWD itself struggled to immediately understand, given that the process in question had been functioning effectively for over a decade. Therein lies the core issue. If CRWD cannot definitively identify the cause and demonstrate measures to prevent recurrence, rebuilding trust will be extremely challenging. In such a scenario, the CRWD EDR agent risks being perceived as a 'black box'. Alternatively, its behavioral protection mechanism might need to be disabled, or the agent could be relegated to serving solely as log collection software. However, even in this reduced capacity, it appears the CRWD agent isn't immune to errors, as evidenced by Google's reported crash. Notably, Google utilizes the CRWD agent purely for log collection, having built its own Security Operations Center (SOC) and data backend to manage these agents.

The CRWD incident differs significantly from the Okta event. In Okta's case, the breach primarily resulted in the exposure of customers' contact information, rather than critical operational data such as employee device authentication keys. This limited data leak, while concerning, didn't pose an immediate or severe threat to customers' operations. The stolen data could potentially be used for phishing campaigns, but hackers wouldn't be able to directly access enterprise Okta Single Sign-On (SSO) systems using this information. Had that been possible, the Okta breach would have been comparable in severity to the CrowdStrike crash.

Moreover, the nature of Okta's data leak meant that it didn't immediately disrupt normal business operations. Employees could still use their PCs, and customers could continue to access necessary information. This stands in stark contrast to the widespread system failures caused by the CRWD incident.

Furthermore, Okta's market position is unique. The primary alternative for cloud Identity and Access Management (IAM) is Microsoft's Azure Entra ID, but non-Microsoft customers and digital-native companies typically don't consider it a viable option. This exclusivity gives Okta a significant advantage. Okta's growth still primarily depends on its core IAM product, rather than Identity Governance and Administration (IGA) or Privileged Access Management (PAM). Consequently, the reputational damage from the breach didn't significantly impact Okta's cross-selling efforts or dramatically decelerate its revenue growth.

So what's the impact on CRWD? Obviously, its growth and margin are going to come down, but how?

  • The most direct impact for CRWD is its existing install base of endpoint security. The incident ensnared 8.5 million Windows devices, less than 1% of the global total, Microsoft said. And this is going to directly impact CRWD's total number of devices covered. There are other agents in MacOS and Linux, but historically CRWD has been focusing on the latest Windows only, so they constitute as a minority. Similar to Elon Musk, who quickly said he deleted all CRWD agents within TSLA, there will be lots of companies who will move off CRWD or diversify away from it.
  • A majority of CRWD customers could still keep the product until the license expires because this is quite an expensive product. And upon renewal, they will opt for other solutions or ask CRWD for greater discounts. CRWD may be willing to cut the price massively to keep them onboard because otherwise that means totally losing these revenue streams. For critical server workloads, these customers should and will move to solutions like S (SentinelOne) whereby the agent is sitting more on the sideline to ensure minimum potential inline impact to the operation. Server agents represent a smaller revenue portion within CRWD so it wouldn't impact the total number of installments that much like employee PCs do.
  • For new customers, it would be very hard to consider CRWD as the obvious first choice and not even include other vendors in the initial shortlisting. Then if S and PANW get a chance to compete, there is a good chance that CRWD will lose on credibility, pricing, and performance. Again, CRWD's best way to alleviate this will be via price cuts and potentially better marketing to cover the post-impact of this event. As a result, we can easily see CRWD's gross margin slide back down to 75% or lower, with increased S&M as a % of revenue.
  • The CRWD incident could also significantly impact its non-EDR businesses, such as cloud security and NG-SIEM. CRWD has primarily relied on cross-selling opportunities to drive overall company revenue growth. While Palo Alto Networks (PANW) employed a similar strategy, its Prisma and Cortex businesses only have a 30%-60% overlap with its traditional firewall business. This is because PANW acquired startups that already had GTM success and founders capable of independently growing their market share, which PANW then further accelerated. These acquired products can be sold independently and compete with standalone BoB solutions like Wiz. In contrast, CRWD mostly acquired products and sold them to existing customers. It appears that almost all buyers of these emerging CRWD products are already CRWD customers, which is markedly different from PANW's approach. This high dependency on existing customers could potentially amplify the negative impact of the EDR incident across CrowdStrike's entire product portfolio.

Implications for CRWD's non-EDR Business

CRWD's success in non-EDR sales can be attributed in large part to the laziness of many CIOs and CISOs who hadn't yet adopted cloud security, NG-SIEM, or other emerging security products. CRWD's sales representatives exploited this gap, offering these missing pieces to large enterprise customers at 1/10 the price compared to BoB market leaders like Wiz or PANW's Prisma Cloud. Many of these decision-makers, reluctant to invest time in thorough market evaluation and performance testing, opted for CRWD's additional products. This surprising trend is corroborated by our coverage of Zscaler and conversations with industry practitioners. CRWD's cloud security competitors were often blindsided by this 'shadow competitor' that rarely participated in POC evaluations yet managed to generate over $100m in cloud security revenue. Our extensive analysis of cloud security startups reveals that while CRWD offers a competent cloud security product, it's not considered BoB. Unlike PANW's Prisma Cloud, which is widely adopted even by companies without an existing firewall install base, CRWD's cloud security offerings are primarily purchased by their existing customers, indicating a reliance on cross-selling rather than standalone product strength.

The primary drivers behind the growth in CRWD non-EDR business are:

  1. Its stellar reputation, not only as an endpoint security company but across the entire cybersecurity landscape. CRWD is often the first point of contact for enterprises experiencing a breach, and companies trust both CRWD's EDR product and its exceptional professional service team.
  2. Despite premium pricing, most customers (especially at the C-suite level) hold CRWD in high regard, appreciating both its service quality and powerful marketing efforts.
  3. The majority of non-EDR buyers are existing customers who simply trusted the CRWD brand without conducting in-depth analysis, or were persuaded by the native platform integration and bundled pricing.

Given the magnitude, scale, and severity of the recent incident, most of these customers will likely need to reassess their decision to adopt additional CRWD products. This reassessment could significantly reduce demand and hinder CRWD's ability to further cross-sell these non-EDR products. As discounts are already quite steep, further price reductions may not be an effective strategy. For new customers considering adopting the full platform and increasing their reliance on CRWD, this incident clearly suggests caution is warranted.

Rounding it up

To summarize, our analysis suggests that CRWD is likely to experience a decline in both revenue growth and margins over the next few quarters. However, during an economic downturn, most customers will find it challenging to switch to alternative solutions before their expensive CRWD licenses expire. Those who opted for the full platform bundle at a lower cost may also lack the budget to purchase products from other vendors, such as Wiz. This situation could potentially benefit Palo Alto Networks (PANW) if they can offer a competitively priced Cortex XDR + Prisma Cloud bundle and leverage their recent policy of providing free usage periods to cover gaps for customers with existing CRWD licenses. We will address the impact on other vendors in a separate note.

In the short term, Goldman Sachs' projection may be accurate. We don't anticipate a sharp decline in CRWD revenue, especially for the upcoming quarter, as they should be able to close most deals before July 15th. However, for deals after that date, leading indicators like billing and sales channel checks may paint a different picture. In the coming quarters, we may see CRWD revising its growth guidance downward to the low 20% range or even mid-teens. This revision would primarily be due to necessary price concessions and potential customer losses. However, the decline might be partially mitigated by factors such as customer lock-in (due to professional services and security data) and some customers' willingness to stay with the vendor (as seen with SolarWinds maintaining positive growth post-breach). The net effect of these opposing forces is likely to result in slower growth, but not a complete collapse of CRWD's business.

Our base assumption is that 50% of customers will decide to leave or diversify away from CRWD, potentially creating a ~15% growth headwind. CRWD may also need to make more price concessions, which could lead to a decline in FCF to around 20%. If this scenario materializes, the market may not yet be pricing in this future deterioration. We might see the next earnings report (August 29th) perform better than feared, temporarily calming the market. However, management may provide weaker guidance, and subsequent weak channel signals could put CRWD back into a downward trend.

These projections are speculative, and we will continue to monitor developments. We plan to offer a more comprehensive DCF valuation in our upcoming update.

!DOCTYPE html> Contact Footer Example