From 68dd25510e256142d37167e6e8d96278e40361f5 Mon Sep 17 00:00:00 2001 From: "Allen D. Householder" Date: Thu, 29 Oct 2020 11:38:59 -0400 Subject: [PATCH] split 040_trees... into 4 separate files to reduce edit conflicts --- doc/version_1/040_treesForVulMgmt.md | 417 ------------------------- doc/version_1/042_treesForVulMgmt_2.md | 106 +++++++ doc/version_1/045_treesForVulMgmt_3.md | 254 +++++++++++++++ doc/version_1/047_treesForVulMgmt_4.md | 55 ++++ 4 files changed, 415 insertions(+), 417 deletions(-) create mode 100644 doc/version_1/042_treesForVulMgmt_2.md create mode 100644 doc/version_1/045_treesForVulMgmt_3.md create mode 100644 doc/version_1/047_treesForVulMgmt_4.md diff --git a/doc/version_1/040_treesForVulMgmt.md b/doc/version_1/040_treesForVulMgmt.md index 68055fe8..a25d4706 100644 --- a/doc/version_1/040_treesForVulMgmt.md +++ b/doc/version_1/040_treesForVulMgmt.md @@ -105,420 +105,3 @@ Products, libraries, and applications tend to be appropriate objects of focus wh ### Reasoning Steps Forward This aspect of scope is about immediacy, prevalence, and causal importance. Immediacy is about how soon after the decision point adverse effects should occur to be considered relevant. Prevalence is about how common adverse effects should be to be considered relevant. Causal importance is about how much an exploitation of the software in the cyber-physical system contributes to adverse effects to be considered relevant. Our recommendation is to walk a pragmatic middle path on all three aspects. Effects are not relevant if they are merely possible, too infrequent, far distant, or unchanged by the vulnerability. But effects are relevant long before they are absolutely certain, ubiquitous, or occurring presently. Overall, we summarize this aspect of scope as *consider credible effects based on known use cases of the software system as a part of cyber-physical systems*. - -## Likely Decision Points and Relevant Data - -We propose the following decision points and associated values should be a factor when making decisions about vulnerability prioritization. Each decision point is tagged with the stakeholder it is relevant to: deployers, suppliers, or both. We emphasize that these descriptions are hypotheses to be further tested and validated. We made every effort to put forward informed and useful decision frameworks with wide applicability, but the goal of this paper is more to solicit feedback than make a declaration. We welcome questions, constructive criticism, refuting evidence, or supporting evidence about any aspect of this proposal. - -One important omission from the values for each category is an “unknown” option. Instead, we recommend explicitly identifying an option that is a reasonable assumption based on prior events. Such an option requires reliable historical evidence for what tends to be the case; of course, future events may require changes to these assumptions over time. Therefore, our assumptions require evidence and are open to debate in light of new evidence. Different risk tolerance or risk discounting postures are not addressed in the current work; accommodating such tolerance or discounting explicitly is an area for future work. This flexibility fits into our overall goal of supplying a decision-making framework that is both transparent and fits the needs of different communities. Resisting an “unknown” option discourages the modeler from silently embedding these assumptions in their choices for how the decision tree flows below the selection of any “unknown” option. - -We propose satisfactory decision points for vulnerability management in the next sections, in no particular order. - -### Exploitation (Supplier, Deployer) -> Evidence of Active Exploitation of a Vulnerability - -The intent of this measure is the present state of exploitation of the vulnerability. The intent is not to predict future exploitation but only to acknowledge the current state of affairs. Predictive systems, such as EPSS, could be used to augment this decision or to notify stakeholders of likely changes [@jacobs2019exploit]. - -| | Table 4: Exploitation Decision Values | -| --- | --------------------------------- | -| None | There is no evidence of active exploitation and no public proof of concept (PoC) of how to exploit the vulnerability. | -| PoC
(Proof of Concept) | One of the following cases is true: (1) exploit code sold or traded on underground or restricted fora; (2) typical public PoC in places such as Metasploit or ExploitDB; or (3) the vulnerability has a well-known method of exploitation. Some examples of condition (3) are open-source web proxies serve as the PoC code for how to exploit any vulnerability in the vein of improper validation of TLS certificates. As another example, Wireshark serves as a PoC for packet replay attacks on ethernet or WiFi networks. | -| Active | Shared, observable, reliable evidence that the exploit is being used in the wild by real attackers; there is credible public reporting. | - -### System Exposure (Deployer) -> The Accessible Attack Surface of the Affected System or Service - -Measuring attack surface precisely is difficult, and we do not propose to perfectly delineate between small and controlled access. -Exposure should be judged against the system in its deployed context, which may differ from how it is commonly expected to be deployed. -For example, the exposure of a device on a vehicle's CAN bus will vary depending on the presence of a cellular telemetry device on the same bus. - -If a vulnerability cannot be patched, other mitigations may be used. -Usually, the effect of these mitigations is to reduce exposure of the vulnerable component. -Therefore, a deployer’s response to Exposure may change if such mitigations are put in place. -If a mitigation changes exposure and thereby reduces the priority of a vulnerability, that mitigation can be considered a success. -Whether that mitigation allows the deployer to defer further action varies according to each case. - - - -| | Table 9: Exposure Decision Values | -| ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Small | Local service or program; highly controlled network | -| Controlled | Networked service with some access restrictions or mitigations already in place (whether locally or on the network). A successful mitigation must reliably interrupt the adversary’s attack, which requires the attack is detectable both reliably and quickly enough to respond. *Controlled* covers the situation in which a vulnerability can be exploited through chaining it with other vulnerabilities. The assumption is that the number of steps in the attack path is relatively low; if the path is long enough that it is implausible for an adversary to reliably execute it, then *exposure* should be *small*. | -| Open | Internet or another widely accessible network where access cannot plausibly be restricted or controlled (e.g., DNS servers, web servers, VOIP servers, email servers) | - -### Technical Impact (Supplier) -> Technical Impact of Exploiting the Vulnerability - -When evaluating *technical impact*, recall the scope definition above. Total control is relative to the affected component where the vulnerability resides. If a vulnerability discloses authentication or authorization credentials to the system, this information disclosure should also be scored as “total” if those credentials give an adversary total control of the component. - - - -| | Table 5: Technical Impact Decision Values | -| ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| Partial | The exploit gives the adversary *limited* control over, or information exposure about, the behavior of the software that contains the vulnerability. Or the exploit gives the adversary an importantly low stochastic opportunity for total control. In this context, “low” means that the attacker cannot reasonably make enough attempts to overcome the low chance of each attempt not working. Denial of service is a form of limited control over the behavior of the vulnerable component. | -| Total | The exploit gives the adversary *total* control over the behavior of the software, or it gives total disclosure of all information on the system that contains the vulnerability | - -### Utility (Supplier, Deployer) -> The Usefulness of the Exploit to the Adversary - -Heuristically, we base *utility* on a combination of value density of vulnerable components and automatability of potential exploitation. This framing makes it easier to analytically derive these categories from a description of the vulnerability and the affected component. Automatability (slow or rapid) and value density (diffuse or concentrated) are defined in Sections 4.4.3.1 and 4.4.3.2. Deployers currently use this feature only as a suggested constraint on the values for *Mission Impact*. -main - -Roughly, *utility* is a combination of two things: (1) the value of each exploitation event and (2) the ease and speed with which the adversary can cause exploitation events. We define *utility* as laborious, efficient, or super effective, as described in Table 6. - -| | Table 6: Utility Decision Values | -| --------------- | ------------------------------------------------------------------------------ | -| Laborious | Slow automatability and diffuse value | -| Efficient | {Rapid automatability and diffuse value} OR {Slow automatability and concentrated value} | -| Super Effective | Rapid automatability and concentrated value | - -#### Automatability - -*Automatability* is described as slow or rapid: - - - **Slow**. Attackers cannot reliably automate steps 1-4 of the kill chain - [@hutchins2011intelligence] for this vulnerability for some reason. These - steps are reconnaissance, weaponization, delivery, and exploitation. Example - reasons for why a step may not be reliably automatable include (1) - the vulnerable component is not searchable or enumerable on the - network, (2) weaponization may require human direction for each - target, (3) delivery may require channels that widely deployed - network security configurations block, and (3) exploitation may be - frustrated by adequate exploit-prevention techniques enabled by - default; ASLR is an example of an exploit-prevention tool. - - - **Rapid**. Attackers can reliably automate steps 1-4 of the of the kill - chain. If the vulnerability allows remote code execution or command - injection, the default response should be rapid. - -Due to vulnerability chaining, there is some nuance as to whether reconnaissance can be automated. For example, consider a vulnerability A. If the systems vulnerable to A are usually not openly connected to incoming traffic ([*Exposure*](#exposure) is [small](#exposure) or [controlled](#exposure)), reconnaissance probably cannot be automated (as scans should be blocked, etc.). This fact would make automatability [slow](#automatability). However, if another vulnerability B with [rapid](#automatiability) automatability can be reliably used to chain to vulnerability A, then that automates reconnaissance of vulnerable systems. In such a situation, the analyst should continue to analyze vulnerability A to understand whether the remaining steps in the kill chain can be automated. - -#### Value Density - -*Value density* is described as diffuse or concentrated: - - - **Diffuse**. The system that contains the vulnerable component has - limited resources. That is, the resources that the adversary will - gain control over with a single exploitation event are relatively - small. Examples of systems with diffuse value are email accounts, - most consumer online banking accounts, common cell phones, and most - personal computing resources owned and maintained by users. (A - “user” is anyone whose professional task is something other than - the maintenance of the system or component. As with *safety impact*, - a “system operator” is anyone who is professionally responsible for - the proper operation or maintenance of a system.) - - - **Concentrated**. The system that contains the vulnerable component - is rich in resources. Heuristically, such systems are often the - direct responsibility of “system operators” rather than users. - Examples of concentrated value are database systems, Kerberos - servers, web servers hosting login pages, and cloud service - providers. However, usefulness and uniqueness of the resources on - the vulnerable system also inform value density. For example, - encrypted mobile messaging platforms may have concentrated value, - not because each phone’s messaging history has a particularly large - amount of data, but because it is uniquely valuable to law - enforcement. - -The output for the *Utility* decision point is visualized in Table 7. - -Table 7: Utility to the Adversary, as a Combination of Automatability and Value Density - -| *Automatability* | *Value Density* | *Utility* | -| ----------- | --------------- | --: | -| **slow** | **diffuse** | laborious | -| **slow** | **concentrated** | efficient | -| **rapid** | **diffuse** | efficient | -| **rapid** | **concentrated** | super effective | - - -Alternative heuristics for proxying adversary utility are plausible. One such example is the value the vulnerability would have were it sold on the open market. Some firms, such as [Zerodium](https://zerodium.com/program.html), make such pricing structures public. The valuable exploits track the automatability and value density heuristics for the most part. Within a single system—whether it is Apache, Windows, iOS or WhatsApp—more automated kill chain steps successfully leads to higher exploit value. Remote code execution with sandbox escape and without user interaction are the most valuable exploits, and those features describe automation of the relevant kill chain steps. How equivalently virulent exploits for different systems are priced relative to each other is more idiosyncratic. Price does not only track value density of the system, but presumably also the existing supply of exploits and the installation distribution among the targets of Zerodium’s customers. Currently, we simplify the analysis and ignore these factors. However, future work should look for and prevent large mismatches between the outputs of the *utility* decision point and the exploit markets. - -### Safety Impact (Supplier, Deployer) -> Safety Impacts of Affected System Compromise - -We take an expansive view of safety, in which a safety violation is a violation of what the [Centers for Disease Control (CDC)](https://www.cdc.gov/hrqol/wellbeing.htm#three) calls **well-being**. Physical well-being violations are common safety violations, but we also include economic, social, emotional, and psychological well-being as important. Weighing fine differences among these categories is probably not possible, so we will not try. Each decision option lists examples of the effects that qualify for that value/answer in the various types of violations of well-being. These examples should not be considered comprehensive or exhaustive, but rather as suggestive. - - -The stakeholder should consider the safety impact on the operators (heuristically, by “system operator” we mean those who are professionally -responsible for the proper operation of the cyber-physical system, as the term is used in the safety analysis literature) and users of the software they provide. If software is repackaged and resold by a stakeholder to further downstream entities who will then sell a product, the initial stakeholder can only reasonably consider so many links in that supply chain. But a stakeholder should know its immediate consumers one step away in the supply chain. Those consumers may repackage or build on the software and then provide that product further on. - -We expect that a stakeholder should be aware of common usage of their software about two steps in the supply chain away. This expectation holds in both open source and proprietary contexts. Further steps along the supply chain are probably not reasonable for the stakeholder to consider consistently; however, this is not license to willfully ignore common downstream uses of the stakeholder’s software. If the stakeholder is contractually or legally responsible for safe operation of the software or cyber-physical system of which it is part, that also supersedes our rough supply-chain depth considerations. For software used in a wide variety of sectors and deployments, the stakeholder may need to estimate an aggregate safety impact. Aggregation suggests that the stakeholder’s response to this decision point cannot be less than the most severe credible safety impact, but we leave the specific aggregation method or function as a domain-specific extension for future work. - -#### Advice for Gathering Information to Answer the Safety Impact Question - -The factors that influence the safety impact level are diverse. This paper does not exhaustively discuss how a stakeholder should answer a question; that is a topic for future work. At a minimum, understanding safety impact should include gathering information about survivability of the vulnerable component, determining available operator actions to compensate for the vulnerable component, understanding relevant insurance, and determining the viability of existing backup measures. Each of these information items depends heavily on domain-specific knowledge, and so it is out of the scope of this paper to give a general-purpose strategy for how they should be included. For example, viable manual backup mechanisms are likely important in assessing the safety impact of an industrial control system in a sewage plant, but in banking the insurance structures that prevent bankruptcies are more important. - -The safety impact categories in Table 8 are based on hazard categories for aircraft software [@DO-178C; @faa2000safety, Section 3.3.2]. - -Table 8: Safety Impact Decision Values - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Safety ImpactType of HarmDescription
NoneAllDoes not mean no impact literally; it just means that the effect is below the threshold for all aspects described in Minor
Minor
-(Any one or more of these conditions hold.)
Physical harmPhysical discomfort for users (not operators) of the system
Operator
-resiliency
Requires action by system operator to maintain safe system state as a result of exploitation of the vulnerability where operator actions would be well within expected operator abilities; OR causes a minor occupational safety hazard
System
-resiliency
Small reduction in built-in system safety margins; OR small reduction in system functional capabilities that support safe operation
EnvironmentMinor externalities (property damage, environmental damage, etc.) imposed on other parties
FinancialFinancial losses, which are not readily absorbable, to multiple persons
PsychologicalEmotional or psychological harm, sufficient to be cause for counselling or therapy, to multiple persons
Major
-(Any one or more of these conditions hold.)
Physical harmPhysical distress and injuries for users (not operators) of the system
Operator
-resiliency
Requires action by system operator to maintain safe system state as a result of exploitation of the vulnerability where operator actions would be within their capabilities but the actions require their full attention and effort; OR significant distraction or discomfort to operators; OR causes significant occupational safety hazard
System
-resiliency
System safety margin effectively eliminated but no actual harm; OR failure of system functional capabilities that support safe operation
EnvironmentMajor externalities (property damage, environmental damage, etc.) imposed on other parties
FinancialFinancial losses that likely lead to bankruptcy of multiple persons
PsychologicalWidespread emotional or psychological harm, sufficient to be cause for counselling or therapy, to populations of people
Hazardous
-(Any one or more of these conditions hold.)
Physical harmSerious or fatal injuries, where fatalities are plausibly preventable via emergency services or other measures
Operator
-resiliency
Actions that would keep the system in a safe state are beyond system operator capabilities, resulting in adverse conditions; OR great physical distress to system operators such that they cannot be expected to operate the system properly
System
-resiliency
Parts of the cyber-physical system break; system’s ability to recover lost functionality remains intact
EnvironmentSerious externalities (threat to life as well as property, widespread environmental damage, measurable public health risks, etc.) imposed on other parties
FinancialSocio-technical system (elections, financial grid, etc.) of which the affected component is a part is actively destabilized and enters unsafe state
PsychologicalN/A
Catastrophic (Any one or more of these conditions hold.)Physical harmMultiple immediate fatalities (Emergency response probably cannot save the victims.)
Operator
-resiliency
Operator incapacitated (includes fatality or otherwise incapacitated)
System resiliencyTotal loss of whole cyber-physical system, of which the software is a part
EnvironmentExtreme externalities (immediate public health threat, environmental damage leading to small ecosystem collapse, etc.) imposed on other parties
FinancialSocial systems (elections, financial grid, etc.) supported by the software collapse
PsychologicalN/A
- -#### Public Safety Impact (Supplier) - -Suppliers necessarily have a rather coarse-grained perspective on the broadly defined safety impacts described above. Therefore we simplify the above into a binary categorization: _Significant_ is when any impact meets the criteria for an impact of Major, Hazardous, or Catastrophic in the above table. _Minimal_ is when none do. - -| | Table X: Public Safety Impact | -| ----------- | ---------------------------------------------------| -| Minimal | Safety Impact of None or Minor | -| Significant | Safety Impact of Major, Hazardous, or Catastrophic | - -#### Situated Safety Impact (Deployer) - -Deployers are anticipated to have a more fine-grained perspective on the safety impacts broadly defined in Table 8. However, in order to simplify implementation for deployers we intend to combine this with Mission Impact below, so we defer the topic for now. - -### Mission Impact (Deployer) -> Impact on Mission Essential Functions of the Organization - -A **mission essential function (MEF)** is a function “directly related to accomplishing the organization’s mission as set forth in its statutory or executive charter” [@FCD2_2017, page A-1]. Identifying MEFs is part of business continuity planning or crisis planning. The rough difference between MEFs and non-essential functions is that an organization “must perform a\[n MEF\] during a disruption to normal operations” [@FCD2_2017, page B-2]. The mission is the reason an organization exists, and MEFs are how that mission is affected. Non-essential functions do not directly support the mission per se; however, they support the smooth delivery or success of MEFs. Financial losses—especially to publicly traded for-profit corporations—are covered here as a (legally mandated) mission of such corporations is financial performance. - -| | Table 10: Mission Impact Decision Values | -| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| None / Non-Essential Degraded | Little to no impact up to degradation of non-essential functions; chronic degradation would eventually harm essential functions | -| MEF Support Crippled | Activities that directly support essential functions are crippled; essential functions continue for a time | -| MEF Failure | Any one mission essential function fails for period of time longer than acceptable; overall mission of the organization degraded but can still be accomplished for a time | -| Mission Failure | Multiple or all mission essential functions fail; ability to recover those functions degraded; organization’s ability to deliver its overall mission fails | - -#### Advice for Gathering Information to Answer the Mission Impact Question - -The factors that influence the mission impact level are diverse. This paper does not exhaustively discuss how a stakeholder should answer a question; that is a topic for future work. At a minimum, understanding mission impact should include gathering information about the critical paths that involve vulnerable components, viability of contingency measures, and resiliency of the systems that support the mission. There are various sources of guidance on how to gather this information; see for example the FEMA guidance in Continuity Directive 2 [@FCD2_2017] or OCTAVE FORTE [@tucker2018octave]. This is part of risk management more broadly. It should require the vulnerability management team to interact with more senior management to understand mission priorities and other aspects of risk mitigation. - -As a heuristic, we suggest using the question described in Section 4.4.3, *Utility*, to constrain *Mission Impact*. If *Utility* is **super effective**, then *Mission Impact* is at least **MEF support crippled**. If *Utility* is **efficient**, then *Mission Impact* is at least **non-essential degraded**. - -### Situated Safety / Mission Impact (Deployer) - -In pilot implementations of SSVC, we received feedback that organizations tend to think of mission and safety impacts as if they were combined into a single factor: in other words, the priority increases regardless which of the two impact factors was increased. We therefore combine Situated Safety and Mission Impact for deployers into a single Potential Impact factor as a dimension reduction step as follows. -We observe that the day-to-day operations of an organization often have already built in a degree of tolerance to small-scale variance in mission impacts. Thus in our opinion we need only concern ourselves with discriminating well at the upper end of the scale. -Therefore we combine the three lesser mission impacts of none, non-essential degraded, and MEF support crippled into a single category, while retaining the distinction between MEF Failure and Mission Failure at the extreme. -This gives us 3 levels of mission impact to work with. - -On the other hand, most organizations' tolerance for variance in safety tends to be be lower, meaning that even small deviations in safety are unlikely to go unnoticed or unaddressed. -We suspect that the presence of regulatory oversight for safety issues and its absence at the lower end of the mission impact scale influences this behavior. -Because of this higher sensitivity to safety concerns, we chose to retain a four-level resolution for the safety dimension. We then combine Mission Impact with Situated Safety impact and map these onto a 4-tiered scale (Low, Medium, High, Very High). The mapping is shown in Table X. - -| | Table X: Situated Safety / Mission Impact Decision Values | ||| -|----|------------|--|--|--| -| Mission Impact | None/Degraded/Crippled | MEF Failure | Mission Failure | -| Safety Impact | | | | -| None/Minor | Low | Medium | Very High | -| Major | Medium | High | Very High | -| Hazardous | High | High | Very High | -| Catastrophic | Very High | Very High | Very High | - -## Relationship to asset management - -Our method is for prioritizing vulnerabilities based on the risk stemming from exploitation. There are other reasonable asset management considerations that may influence remediation timelines. There are at least three aspects of asset management that may be important but are out of scope for SSVC. First and most obvious is the transaction cost of conducting the mitigation or fix. System administrators are paid to develop or apply any fixes or mitigations, and there may be other transactional costs such as downtime for updates. Second is the risk of the fix or mitigation introducing a new error or vulnerability. Regression testing is part of managing this type of risk. Finally, there may be an operational cost of applying a fix or mitigation, representing an ongoing change of functionality or increased overhead. A decision maker could order work within one SSVC priority class (scheduled, out-of-cycle, etc.) based on these asset management considerations, for example. Once the organization fixes all the high-priority vulnerabilities, they can then fix the medium-level vulnerabilities with the same effort spent on the high-priority ones. - -Asset management and risk management also drive some of the up-front work an organization would need to do to gather some of the necessary information. This situation is not new; an asset owner cannot prioritize which fixes to deploy to its assets if it does not know what assets it owns and their locations. The organization can pick its choice of tools for these things; there are about 200 asset management tools on the market [@captera]. Standards like the Software Bill of Materials (SBOM) [@manion2019sbom] would likely reduce the burden on asset management, but these are still maturing. If an organization does not have an asset management or risk management (see Section 4.4.6.1) plan and process in place, then it will have a non-trivial amount of work to do to establish these processes before it can take full advantage of SSVC. - -## Supplier Tree - -Figure 1 shows the proposed prioritization decision tree for the supplier. Both supplier and deployer trees use the above decision point definitions. Each tree is a compact way of expressing assertions or hypotheses about the relative priority of different situations. Each tree organizes how we propose a stakeholder should treat these situations. Rectangles are decision points, and triangles represent outcomes. The values for each decision point are different, as described above. Outcomes are priority decisions (defer, scheduled, out-of-cycle, immediate); outcome triangles are color coded: - - - Defer = gray with green outline - - Scheduled = yellow - - Out-of-Cycle = orange - - Immediate = red with black outline - -Figure 1: Suggested supplier tree - -Figure 1: Proposed Vulnerability Prioritization Decision Tree for Patch -Supplier - -## Deployer Tree - -The proposed deployer tree is depicted in Figure 3, Figure 4, and Figure 5. The state of *Exploitation* is the first decision point, but in an effort to make the tree legible, we split the tree into three sub-trees over three pages. We suggest making the decision about *Exploitation* as usual, and then going to the correct subtree. - -Figure 2 - -Figure 2: Proposed Vulnerability Prioritization Decision Tree for Patch -Deployers (Continued in Figure 3 and Figure 4) - -Figure 3 - -Figure 3: Proposed Vulnerability Prioritization Decision Tree for Patch -Deployers (Continued from Figure 2 and in Figure 4). - -Figure 4 - -Figure 4: Proposed Vulnerability Prioritization Decision Tree for Patch -Deployers (Continued from Figure 2 and Figure 3) - -## Evidence Gathering Guidance - -To answer each of these decision points, a supplier or deployer should, as much as possible, have a repeatable evidence collection and evaluation process. However, we are proposing decisions for humans to make, so evidence collection and evaluation is not totally automatable. That caveat notwithstanding, some automation is possible. - -For example, whether exploitation modules are available in ExploitDB, Metasploit, or other sources is straightforward. We hypothesize that searching Github and Pastebin for exploit code should be automatable. A supplier or deployer could then define *Exploitation* **PoC available** to be positive search results for a set of inputs derived from the CVE entry in at least one of these venues. At least, for those vulnerabilities that are not “automatically” PoC-ready, such as on-path attackers for TLS or network replays. - -Some of the decision points require some substantial upfront analysis effort to gather risk assessment or organizational data. However, once gathered, this information can be efficiently reused across many vulnerabilities and only refreshed occasionally. An obvious example of this is the mission impact decision point. To answer this, a deployer must analyze their essential functions, how they interrelate, and how they are supported. Exposure is similar; answering that decision point requires an asset inventory, adequate understanding of the network topology, and a view of the enforced security controls. Independently operated scans, such as Shodan or Shadowserver, may play a role in evaluating exposure, but the entire exposure question cannot be reduced to a binary question of whether an organization’s assets appear in such databases. Once the deployer has the situational awareness to understand MEFs or exposure, selecting the answer for each individual vulnerability is usually straightforward. - -Stakeholders who use the prioritization method should consider releasing the priority with which they handled the vulnerability. This disclosure has various benefits. For example, if the supplier publishes a priority ranking, then deployers could consider that in their decision-making process. One reasonable way to include it is to break ties for the deployer. If a deployer has three “scheduled” vulnerabilities to patch, they may address them in any order. If two vulnerabilities were produced by the supplier as “scheduled” patches, and one was “out-of-cycle,” then the deployer may want to use that information to favor the latter. - -In the case where no information is available or the organization has not yet matured its initial situational analysis, we can suggest something like defaults for some decision points. If the deployer does not know their exposure, that means they do not know where the devices are or how they are controlled, so they should assume *Exposure* is **open**. If the decision maker knows nothing about the environment in which the device is used, we suggest assuming a **major** *Safety Impact*. This position is conservative, but software is thoroughly embedded in daily life now, so we suggest that the decision maker provide evidence that no one’s well-being will suffer. The reach of software exploits is no longer limited to a research network. Similarly, with *Mission Impact*, the deployer should assume that the software is in use at the organization for a reason, and that it supports essential functions unless they have evidence otherwise. With a total lack of information, assume **MEF support crippled** as a default. *Exploitation* needs no special default; if adequate searches are made for exploit code and none is found, the answer is **none**. The decision set {**none**, **open**, **MEF crippled**, **major**} results in a scheduled patch application. - - -## Development Methodology - -For this tabletop refinement, we could not select a mathematically representative set of CVEs. The goal was to select a handful of CVEs that would cover diverse types of vulnerabilities. The CVEs that we used for our tabletop exercises are CVE-2017-8083, CVE-2019-2712, CVE-2014-5570, and CVE-2017-5753. We discussed each one from the perspective of supplier and deployer. We evaluated CVE-2017-8083 twice because our understanding and descriptions had changed materially after the first three CVEs (six evaluation exercises). After we were satisfied that the decision trees were clearly defined and captured our intentions, we began the formal evaluation of the draft trees, which we describe in the next section. diff --git a/doc/version_1/042_treesForVulMgmt_2.md b/doc/version_1/042_treesForVulMgmt_2.md new file mode 100644 index 00000000..89646512 --- /dev/null +++ b/doc/version_1/042_treesForVulMgmt_2.md @@ -0,0 +1,106 @@ +## Likely Decision Points and Relevant Data + +We propose the following decision points and associated values should be a factor when making decisions about vulnerability prioritization. Each decision point is tagged with the stakeholder it is relevant to: deployers, suppliers, or both. We emphasize that these descriptions are hypotheses to be further tested and validated. We made every effort to put forward informed and useful decision frameworks with wide applicability, but the goal of this paper is more to solicit feedback than make a declaration. We welcome questions, constructive criticism, refuting evidence, or supporting evidence about any aspect of this proposal. + +One important omission from the values for each category is an “unknown” option. Instead, we recommend explicitly identifying an option that is a reasonable assumption based on prior events. Such an option requires reliable historical evidence for what tends to be the case; of course, future events may require changes to these assumptions over time. Therefore, our assumptions require evidence and are open to debate in light of new evidence. Different risk tolerance or risk discounting postures are not addressed in the current work; accommodating such tolerance or discounting explicitly is an area for future work. This flexibility fits into our overall goal of supplying a decision-making framework that is both transparent and fits the needs of different communities. Resisting an “unknown” option discourages the modeler from silently embedding these assumptions in their choices for how the decision tree flows below the selection of any “unknown” option. + +We propose satisfactory decision points for vulnerability management in the next sections, in no particular order. + +### Exploitation (Supplier, Deployer) +> Evidence of Active Exploitation of a Vulnerability + +The intent of this measure is the present state of exploitation of the vulnerability. The intent is not to predict future exploitation but only to acknowledge the current state of affairs. Predictive systems, such as EPSS, could be used to augment this decision or to notify stakeholders of likely changes [@jacobs2019exploit]. + +| | Table 4: Exploitation Decision Values | +| --- | --------------------------------- | +| None | There is no evidence of active exploitation and no public proof of concept (PoC) of how to exploit the vulnerability. | +| PoC
(Proof of Concept) | One of the following cases is true: (1) exploit code sold or traded on underground or restricted fora; (2) typical public PoC in places such as Metasploit or ExploitDB; or (3) the vulnerability has a well-known method of exploitation. Some examples of condition (3) are open-source web proxies serve as the PoC code for how to exploit any vulnerability in the vein of improper validation of TLS certificates. As another example, Wireshark serves as a PoC for packet replay attacks on ethernet or WiFi networks. | +| Active | Shared, observable, reliable evidence that the exploit is being used in the wild by real attackers; there is credible public reporting. | + +### Technical Impact (Supplier) +> Technical Impact of Exploiting the Vulnerability + +When evaluating *technical impact*, recall the scope definition above. Total control is relative to the affected component where the vulnerability resides. If a vulnerability discloses authentication or authorization credentials to the system, this information disclosure should also be scored as “total” if those credentials give an adversary total control of the component. + + + +| | Table 5: Technical Impact Decision Values | +| ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| Partial | The exploit gives the adversary *limited* control over, or information exposure about, the behavior of the software that contains the vulnerability. Or the exploit gives the adversary an importantly low stochastic opportunity for total control. In this context, “low” means that the attacker cannot reasonably make enough attempts to overcome the low chance of each attempt not working. Denial of service is a form of limited control over the behavior of the vulnerable component. | +| Total | The exploit gives the adversary *total* control over the behavior of the software, or it gives total disclosure of all information on the system that contains the vulnerability | + +### Utility (Supplier, Deployer) +> The Usefulness of the Exploit to the Adversary + +Heuristically, we base *utility* on a combination of value density of vulnerable components and automatability of potential exploitation. This framing makes it easier to analytically derive these categories from a description of the vulnerability and the affected component. Automatability (slow or rapid) and value density (diffuse or concentrated) are defined in Sections 4.4.3.1 and 4.4.3.2. Deployers currently use this feature only as a suggested constraint on the values for *Mission Impact*. +main + +Roughly, *utility* is a combination of two things: (1) the value of each exploitation event and (2) the ease and speed with which the adversary can cause exploitation events. We define *utility* as laborious, efficient, or super effective, as described in Table 6. + +| | Table 6: Utility Decision Values | +| --------------- | ------------------------------------------------------------------------------ | +| Laborious | Slow automatability and diffuse value | +| Efficient | {Rapid automatability and diffuse value} OR {Slow automatability and concentrated value} | +| Super Effective | Rapid automatability and concentrated value | + +#### Automatability + +*Automatability* is described as slow or rapid: + + - **Slow**. Attackers cannot reliably automate steps 1-4 of the kill chain + [@hutchins2011intelligence] for this vulnerability for some reason. These + steps are reconnaissance, weaponization, delivery, and exploitation. Example + reasons for why a step may not be reliably automatable include (1) + the vulnerable component is not searchable or enumerable on the + network, (2) weaponization may require human direction for each + target, (3) delivery may require channels that widely deployed + network security configurations block, and (3) exploitation may be + frustrated by adequate exploit-prevention techniques enabled by + default; ASLR is an example of an exploit-prevention tool. + + - **Rapid**. Attackers can reliably automate steps 1-4 of the of the kill + chain. If the vulnerability allows remote code execution or command + injection, the default response should be rapid. + +Due to vulnerability chaining, there is some nuance as to whether reconnaissance can be automated. For example, consider a vulnerability A. If the systems vulnerable to A are usually not openly connected to incoming traffic ([*Exposure*](#exposure) is [small](#exposure) or [controlled](#exposure)), reconnaissance probably cannot be automated (as scans should be blocked, etc.). This fact would make automatability [slow](#automatability). However, if another vulnerability B with [rapid](#automatiability) automatability can be reliably used to chain to vulnerability A, then that automates reconnaissance of vulnerable systems. In such a situation, the analyst should continue to analyze vulnerability A to understand whether the remaining steps in the kill chain can be automated. + +#### Value Density + +*Value density* is described as diffuse or concentrated: + + - **Diffuse**. The system that contains the vulnerable component has + limited resources. That is, the resources that the adversary will + gain control over with a single exploitation event are relatively + small. Examples of systems with diffuse value are email accounts, + most consumer online banking accounts, common cell phones, and most + personal computing resources owned and maintained by users. (A + “user” is anyone whose professional task is something other than + the maintenance of the system or component. As with *safety impact*, + a “system operator” is anyone who is professionally responsible for + the proper operation or maintenance of a system.) + + - **Concentrated**. The system that contains the vulnerable component + is rich in resources. Heuristically, such systems are often the + direct responsibility of “system operators” rather than users. + Examples of concentrated value are database systems, Kerberos + servers, web servers hosting login pages, and cloud service + providers. However, usefulness and uniqueness of the resources on + the vulnerable system also inform value density. For example, + encrypted mobile messaging platforms may have concentrated value, + not because each phone’s messaging history has a particularly large + amount of data, but because it is uniquely valuable to law + enforcement. + +The output for the *Utility* decision point is visualized in Table 7. + +Table 7: Utility to the Adversary, as a Combination of Automatability and Value Density + +| *Automatability* | *Value Density* | *Utility* | +| ----------- | --------------- | --: | +| **slow** | **diffuse** | laborious | +| **slow** | **concentrated** | efficient | +| **rapid** | **diffuse** | efficient | +| **rapid** | **concentrated** | super effective | + + +Alternative heuristics for proxying adversary utility are plausible. One such example is the value the vulnerability would have were it sold on the open market. Some firms, such as [Zerodium](https://zerodium.com/program.html), make such pricing structures public. The valuable exploits track the automatability and value density heuristics for the most part. Within a single system—whether it is Apache, Windows, iOS or WhatsApp—more automated kill chain steps successfully leads to higher exploit value. Remote code execution with sandbox escape and without user interaction are the most valuable exploits, and those features describe automation of the relevant kill chain steps. How equivalently virulent exploits for different systems are priced relative to each other is more idiosyncratic. Price does not only track value density of the system, but presumably also the existing supply of exploits and the installation distribution among the targets of Zerodium’s customers. Currently, we simplify the analysis and ignore these factors. However, future work should look for and prevent large mismatches between the outputs of the *utility* decision point and the exploit markets. diff --git a/doc/version_1/045_treesForVulMgmt_3.md b/doc/version_1/045_treesForVulMgmt_3.md new file mode 100644 index 00000000..674d52e1 --- /dev/null +++ b/doc/version_1/045_treesForVulMgmt_3.md @@ -0,0 +1,254 @@ +### Safety Impact (Supplier, Deployer) +> Safety Impacts of Affected System Compromise + +We take an expansive view of safety, in which a safety violation is a violation of what the [Centers for Disease Control (CDC)](https://www.cdc.gov/hrqol/wellbeing.htm#three) calls **well-being**. Physical well-being violations are common safety violations, but we also include economic, social, emotional, and psychological well-being as important. Weighing fine differences among these categories is probably not possible, so we will not try. Each decision option lists examples of the effects that qualify for that value/answer in the various types of violations of well-being. These examples should not be considered comprehensive or exhaustive, but rather as suggestive. + + +The stakeholder should consider the safety impact on the operators (heuristically, by “system operator” we mean those who are professionally +responsible for the proper operation of the cyber-physical system, as the term is used in the safety analysis literature) and users of the software they provide. If software is repackaged and resold by a stakeholder to further downstream entities who will then sell a product, the initial stakeholder can only reasonably consider so many links in that supply chain. But a stakeholder should know its immediate consumers one step away in the supply chain. Those consumers may repackage or build on the software and then provide that product further on. + +We expect that a stakeholder should be aware of common usage of their software about two steps in the supply chain away. This expectation holds in both open source and proprietary contexts. Further steps along the supply chain are probably not reasonable for the stakeholder to consider consistently; however, this is not license to willfully ignore common downstream uses of the stakeholder’s software. If the stakeholder is contractually or legally responsible for safe operation of the software or cyber-physical system of which it is part, that also supersedes our rough supply-chain depth considerations. For software used in a wide variety of sectors and deployments, the stakeholder may need to estimate an aggregate safety impact. Aggregation suggests that the stakeholder’s response to this decision point cannot be less than the most severe credible safety impact, but we leave the specific aggregation method or function as a domain-specific extension for future work. + +#### Advice for Gathering Information to Answer the Safety Impact Question + +The factors that influence the safety impact level are diverse. This paper does not exhaustively discuss how a stakeholder should answer a question; that is a topic for future work. At a minimum, understanding safety impact should include gathering information about survivability of the vulnerable component, determining available operator actions to compensate for the vulnerable component, understanding relevant insurance, and determining the viability of existing backup measures. Each of these information items depends heavily on domain-specific knowledge, and so it is out of the scope of this paper to give a general-purpose strategy for how they should be included. For example, viable manual backup mechanisms are likely important in assessing the safety impact of an industrial control system in a sewage plant, but in banking the insurance structures that prevent bankruptcies are more important. + +The safety impact categories in Table 8 are based on hazard categories for aircraft software [@DO-178C; @faa2000safety, Section 3.3.2]. + +Table 8: Safety Impact Decision Values + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Safety ImpactType of HarmDescription
NoneAllDoes not mean no impact literally; it just means that the effect is below the threshold for all aspects described in Minor
Minor
+(Any one or more of these conditions hold.)
Physical harmPhysical discomfort for users (not operators) of the system
Operator
+resiliency
Requires action by system operator to maintain safe system state as a result of exploitation of the vulnerability where operator actions would be well within expected operator abilities; OR causes a minor occupational safety hazard
System
+resiliency
Small reduction in built-in system safety margins; OR small reduction in system functional capabilities that support safe operation
EnvironmentMinor externalities (property damage, environmental damage, etc.) imposed on other parties
FinancialFinancial losses, which are not readily absorbable, to multiple persons
PsychologicalEmotional or psychological harm, sufficient to be cause for counselling or therapy, to multiple persons
Major
+(Any one or more of these conditions hold.)
Physical harmPhysical distress and injuries for users (not operators) of the system
Operator
+resiliency
Requires action by system operator to maintain safe system state as a result of exploitation of the vulnerability where operator actions would be within their capabilities but the actions require their full attention and effort; OR significant distraction or discomfort to operators; OR causes significant occupational safety hazard
System
+resiliency
System safety margin effectively eliminated but no actual harm; OR failure of system functional capabilities that support safe operation
EnvironmentMajor externalities (property damage, environmental damage, etc.) imposed on other parties
FinancialFinancial losses that likely lead to bankruptcy of multiple persons
PsychologicalWidespread emotional or psychological harm, sufficient to be cause for counselling or therapy, to populations of people
Hazardous
+(Any one or more of these conditions hold.)
Physical harmSerious or fatal injuries, where fatalities are plausibly preventable via emergency services or other measures
Operator
+resiliency
Actions that would keep the system in a safe state are beyond system operator capabilities, resulting in adverse conditions; OR great physical distress to system operators such that they cannot be expected to operate the system properly
System
+resiliency
Parts of the cyber-physical system break; system’s ability to recover lost functionality remains intact
EnvironmentSerious externalities (threat to life as well as property, widespread environmental damage, measurable public health risks, etc.) imposed on other parties
FinancialSocio-technical system (elections, financial grid, etc.) of which the affected component is a part is actively destabilized and enters unsafe state
PsychologicalN/A
Catastrophic (Any one or more of these conditions hold.)Physical harmMultiple immediate fatalities (Emergency response probably cannot save the victims.)
Operator
+resiliency
Operator incapacitated (includes fatality or otherwise incapacitated)
System resiliencyTotal loss of whole cyber-physical system, of which the software is a part
EnvironmentExtreme externalities (immediate public health threat, environmental damage leading to small ecosystem collapse, etc.) imposed on other parties
FinancialSocial systems (elections, financial grid, etc.) supported by the software collapse
PsychologicalN/A
+ +#### Public Safety Impact (Supplier) + +Suppliers necessarily have a rather coarse-grained perspective on the broadly defined safety impacts described above. Therefore we simplify the above into a binary categorization: _Significant_ is when any impact meets the criteria for an impact of Major, Hazardous, or Catastrophic in the above table. _Minimal_ is when none do. + +| | Table X: Public Safety Impact | +| ----------- | ---------------------------------------------------| +| Minimal | Safety Impact of None or Minor | +| Significant | Safety Impact of Major, Hazardous, or Catastrophic | + +#### Situated Safety Impact (Deployer) + +Deployers are anticipated to have a more fine-grained perspective on the safety impacts broadly defined in Table 8. However, in order to simplify implementation for deployers we intend to combine this with Mission Impact below, so we defer the topic for now. + +### Mission Impact (Deployer) +> Impact on Mission Essential Functions of the Organization + +A **mission essential function (MEF)** is a function “directly related to accomplishing the organization’s mission as set forth in its statutory or executive charter” [@FCD2_2017, page A-1]. Identifying MEFs is part of business continuity planning or crisis planning. The rough difference between MEFs and non-essential functions is that an organization “must perform a\[n MEF\] during a disruption to normal operations” [@FCD2_2017, page B-2]. The mission is the reason an organization exists, and MEFs are how that mission is affected. Non-essential functions do not directly support the mission per se; however, they support the smooth delivery or success of MEFs. Financial losses—especially to publicly traded for-profit corporations—are covered here as a (legally mandated) mission of such corporations is financial performance. + +| | Table 10: Mission Impact Decision Values | +| ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| None / Non-Essential Degraded | Little to no impact up to degradation of non-essential functions; chronic degradation would eventually harm essential functions | +| MEF Support Crippled | Activities that directly support essential functions are crippled; essential functions continue for a time | +| MEF Failure | Any one mission essential function fails for period of time longer than acceptable; overall mission of the organization degraded but can still be accomplished for a time | +| Mission Failure | Multiple or all mission essential functions fail; ability to recover those functions degraded; organization’s ability to deliver its overall mission fails | + +#### Advice for Gathering Information to Answer the Mission Impact Question + +The factors that influence the mission impact level are diverse. This paper does not exhaustively discuss how a stakeholder should answer a question; that is a topic for future work. At a minimum, understanding mission impact should include gathering information about the critical paths that involve vulnerable components, viability of contingency measures, and resiliency of the systems that support the mission. There are various sources of guidance on how to gather this information; see for example the FEMA guidance in Continuity Directive 2 [@FCD2_2017] or OCTAVE FORTE [@tucker2018octave]. This is part of risk management more broadly. It should require the vulnerability management team to interact with more senior management to understand mission priorities and other aspects of risk mitigation. + +As a heuristic, we suggest using the question described in Section 4.4.3, *Utility*, to constrain *Mission Impact*. If *Utility* is **super effective**, then *Mission Impact* is at least **MEF support crippled**. If *Utility* is **efficient**, then *Mission Impact* is at least **non-essential degraded**. + +### Situated Safety / Mission Impact (Deployer) + +In pilot implementations of SSVC, we received feedback that organizations tend to think of mission and safety impacts as if they were combined into a single factor: in other words, the priority increases regardless which of the two impact factors was increased. We therefore combine Situated Safety and Mission Impact for deployers into a single Potential Impact factor as a dimension reduction step as follows. +We observe that the day-to-day operations of an organization often have already built in a degree of tolerance to small-scale variance in mission impacts. Thus in our opinion we need only concern ourselves with discriminating well at the upper end of the scale. +Therefore we combine the three lesser mission impacts of none, non-essential degraded, and MEF support crippled into a single category, while retaining the distinction between MEF Failure and Mission Failure at the extreme. +This gives us 3 levels of mission impact to work with. + +On the other hand, most organizations' tolerance for variance in safety tends to be be lower, meaning that even small deviations in safety are unlikely to go unnoticed or unaddressed. +We suspect that the presence of regulatory oversight for safety issues and its absence at the lower end of the mission impact scale influences this behavior. +Because of this higher sensitivity to safety concerns, we chose to retain a four-level resolution for the safety dimension. We then combine Mission Impact with Situated Safety impact and map these onto a 4-tiered scale (Low, Medium, High, Very High). The mapping is shown in Table X. + +| | Table X: Situated Safety / Mission Impact Decision Values | ||| +|----|------------|--|--|--| +| Mission Impact | None/Degraded/Crippled | MEF Failure | Mission Failure | +| Safety Impact | | | | +| None/Minor | Low | Medium | Very High | +| Major | Medium | High | Very High | +| Hazardous | High | High | Very High | +| Catastrophic | Very High | Very High | Very High | + +### System Exposure (Deployer) +> The Accessible Attack Surface of the Affected System or Service + +Measuring attack surface precisely is difficult, and we do not propose to perfectly delineate between small and controlled access. +Exposure should be judged against the system in its deployed context, which may differ from how it is commonly expected to be deployed. +For example, the exposure of a device on a vehicle's CAN bus will vary depending on the presence of a cellular telemetry device on the same bus. + +If a vulnerability cannot be patched, other mitigations may be used. +Usually, the effect of these mitigations is to reduce exposure of the vulnerable component. +Therefore, a deployer’s response to Exposure may change if such mitigations are put in place. +If a mitigation changes exposure and thereby reduces the priority of a vulnerability, that mitigation can be considered a success. +Whether that mitigation allows the deployer to defer further action varies according to each case. + + + +| | Table 9: Exposure Decision Values | +| ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Small | Local service or program; highly controlled network | +| Controlled | Networked service with some access restrictions or mitigations already in place (whether locally or on the network). A successful mitigation must reliably interrupt the adversary’s attack, which requires the attack is detectable both reliably and quickly enough to respond. *Controlled* covers the situation in which a vulnerability can be exploited through chaining it with other vulnerabilities. The assumption is that the number of steps in the attack path is relatively low; if the path is long enough that it is implausible for an adversary to reliably execute it, then *exposure* should be *small*. | +| Open | Internet or another widely accessible network where access cannot plausibly be restricted or controlled (e.g., DNS servers, web servers, VOIP servers, email servers) | + diff --git a/doc/version_1/047_treesForVulMgmt_4.md b/doc/version_1/047_treesForVulMgmt_4.md new file mode 100644 index 00000000..772e7219 --- /dev/null +++ b/doc/version_1/047_treesForVulMgmt_4.md @@ -0,0 +1,55 @@ +## Relationship to asset management + +Our method is for prioritizing vulnerabilities based on the risk stemming from exploitation. There are other reasonable asset management considerations that may influence remediation timelines. There are at least three aspects of asset management that may be important but are out of scope for SSVC. First and most obvious is the transaction cost of conducting the mitigation or fix. System administrators are paid to develop or apply any fixes or mitigations, and there may be other transactional costs such as downtime for updates. Second is the risk of the fix or mitigation introducing a new error or vulnerability. Regression testing is part of managing this type of risk. Finally, there may be an operational cost of applying a fix or mitigation, representing an ongoing change of functionality or increased overhead. A decision maker could order work within one SSVC priority class (scheduled, out-of-cycle, etc.) based on these asset management considerations, for example. Once the organization fixes all the high-priority vulnerabilities, they can then fix the medium-level vulnerabilities with the same effort spent on the high-priority ones. + +Asset management and risk management also drive some of the up-front work an organization would need to do to gather some of the necessary information. This situation is not new; an asset owner cannot prioritize which fixes to deploy to its assets if it does not know what assets it owns and their locations. The organization can pick its choice of tools for these things; there are about 200 asset management tools on the market [@captera]. Standards like the Software Bill of Materials (SBOM) [@manion2019sbom] would likely reduce the burden on asset management, but these are still maturing. If an organization does not have an asset management or risk management (see Section 4.4.6.1) plan and process in place, then it will have a non-trivial amount of work to do to establish these processes before it can take full advantage of SSVC. + +## Supplier Tree + +Figure 1 shows the proposed prioritization decision tree for the supplier. Both supplier and deployer trees use the above decision point definitions. Each tree is a compact way of expressing assertions or hypotheses about the relative priority of different situations. Each tree organizes how we propose a stakeholder should treat these situations. Rectangles are decision points, and triangles represent outcomes. The values for each decision point are different, as described above. Outcomes are priority decisions (defer, scheduled, out-of-cycle, immediate); outcome triangles are color coded: + + - Defer = gray with green outline + - Scheduled = yellow + - Out-of-Cycle = orange + - Immediate = red with black outline + +Figure 1: Suggested supplier tree + +Figure 1: Proposed Vulnerability Prioritization Decision Tree for Patch +Supplier + +## Deployer Tree + +The proposed deployer tree is depicted in Figure 3, Figure 4, and Figure 5. The state of *Exploitation* is the first decision point, but in an effort to make the tree legible, we split the tree into three sub-trees over three pages. We suggest making the decision about *Exploitation* as usual, and then going to the correct subtree. + +Figure 2 + +Figure 2: Proposed Vulnerability Prioritization Decision Tree for Patch +Deployers (Continued in Figure 3 and Figure 4) + +Figure 3 + +Figure 3: Proposed Vulnerability Prioritization Decision Tree for Patch +Deployers (Continued from Figure 2 and in Figure 4). + +Figure 4 + +Figure 4: Proposed Vulnerability Prioritization Decision Tree for Patch +Deployers (Continued from Figure 2 and Figure 3) + +## Evidence Gathering Guidance + +To answer each of these decision points, a supplier or deployer should, as much as possible, have a repeatable evidence collection and evaluation process. However, we are proposing decisions for humans to make, so evidence collection and evaluation is not totally automatable. That caveat notwithstanding, some automation is possible. + +For example, whether exploitation modules are available in ExploitDB, Metasploit, or other sources is straightforward. We hypothesize that searching Github and Pastebin for exploit code should be automatable. A supplier or deployer could then define *Exploitation* **PoC available** to be positive search results for a set of inputs derived from the CVE entry in at least one of these venues. At least, for those vulnerabilities that are not “automatically” PoC-ready, such as on-path attackers for TLS or network replays. + +Some of the decision points require some substantial upfront analysis effort to gather risk assessment or organizational data. However, once gathered, this information can be efficiently reused across many vulnerabilities and only refreshed occasionally. An obvious example of this is the mission impact decision point. To answer this, a deployer must analyze their essential functions, how they interrelate, and how they are supported. Exposure is similar; answering that decision point requires an asset inventory, adequate understanding of the network topology, and a view of the enforced security controls. Independently operated scans, such as Shodan or Shadowserver, may play a role in evaluating exposure, but the entire exposure question cannot be reduced to a binary question of whether an organization’s assets appear in such databases. Once the deployer has the situational awareness to understand MEFs or exposure, selecting the answer for each individual vulnerability is usually straightforward. + +Stakeholders who use the prioritization method should consider releasing the priority with which they handled the vulnerability. This disclosure has various benefits. For example, if the supplier publishes a priority ranking, then deployers could consider that in their decision-making process. One reasonable way to include it is to break ties for the deployer. If a deployer has three “scheduled” vulnerabilities to patch, they may address them in any order. If two vulnerabilities were produced by the supplier as “scheduled” patches, and one was “out-of-cycle,” then the deployer may want to use that information to favor the latter. + +In the case where no information is available or the organization has not yet matured its initial situational analysis, we can suggest something like defaults for some decision points. If the deployer does not know their exposure, that means they do not know where the devices are or how they are controlled, so they should assume *Exposure* is **open**. If the decision maker knows nothing about the environment in which the device is used, we suggest assuming a **major** *Safety Impact*. This position is conservative, but software is thoroughly embedded in daily life now, so we suggest that the decision maker provide evidence that no one’s well-being will suffer. The reach of software exploits is no longer limited to a research network. Similarly, with *Mission Impact*, the deployer should assume that the software is in use at the organization for a reason, and that it supports essential functions unless they have evidence otherwise. With a total lack of information, assume **MEF support crippled** as a default. *Exploitation* needs no special default; if adequate searches are made for exploit code and none is found, the answer is **none**. The decision set {**none**, **open**, **MEF crippled**, **major**} results in a scheduled patch application. + + +## Development Methodology + +For this tabletop refinement, we could not select a mathematically representative set of CVEs. The goal was to select a handful of CVEs that would cover diverse types of vulnerabilities. The CVEs that we used for our tabletop exercises are CVE-2017-8083, CVE-2019-2712, CVE-2014-5570, and CVE-2017-5753. We discussed each one from the perspective of supplier and deployer. We evaluated CVE-2017-8083 twice because our understanding and descriptions had changed materially after the first three CVEs (six evaluation exercises). After we were satisfied that the decision trees were clearly defined and captured our intentions, we began the formal evaluation of the draft trees, which we describe in the next section.