Security and Privacy Research highlights

Measuring Security Practices

By Louis F. DeKoven, Audrey Randall, Ariana Mirian, Gautam Akiwate, Ansel Blume, Lawrence K. Saul, Aaron Schulman, Geoffrey M. Voelker, and Stefan Savage

Posted Sep 1 2022

Abstract
1. Introduction
2. Background
3. Methodology
4. Dataset
5. Recommended Practices
6. Ranking Feature Importance
7. Conclusion
Acknowledgments
References
Authors
Footnotes

Read the related Technical Perspective

security operator monitoring multiple screens

Users are encouraged to adopt a wide array of technologies and behaviors to reduce their security risk. However, the adoption of these “best practices,” ranging from the use of antivirus products to keeping software updated, is not well understood, nor is their practical impact on security risk well established. To explore these issues, we conducted a large-scale measurement of 15,000 computers over six months. We use passive monitoring to infer and characterize the prevalence of various security practices as well as a range of other potentially security-relevant behaviors. We then explore the extent to which differences in key security behaviors impact the real-world outcomes (i.e., that a device shows clear evidence of having been compromised).

1. Introduction

Our existing models of security all rely on end users to follow a range of best practices; for example, the rapid installation of security updates to patch vulnerabilities. Implicit in this status quo is the recognition that security is not an intrinsic property of today’s systems, but is a byproduct of making appropriate choices—choices about what security products to employ, choices about how to manage system software, and choices about how to engage (or not) with third-party services on the Internet.

However, establishing the value provided by these practices is underexamined at best. First, we have limited empirical data about which security advice is adopted in practice. Users have a plethora of advice to choose from, highlighted by Reeder et al.’s recent study of expert security advice, whose title—”152 Simple Steps to Stay Safe Online”—underscores both the irony and the variability in such security lore.²⁰ A second, more subtle issue concerns the efficacy of such practices when followed: Do they work? Here the evidence is also scant. Even practices widely agreed upon by Reeder’s experts, such as keeping software patched, are not well justified beyond a rhetorical argument. In fact, virtually all established “security best practices” are of this nature, and as summarized by Herley, their “benefit is largely speculative or moot.”¹⁰

This paper seeks to make progress on both issues—the prevalence of popular security practices and their relationship to security outcomes—via the longitudinal empirical measurement of a large population of computer devices. In particular, we perform a preliminary study based on monitoring the online behavior of 15,291 independently administered desktop/laptop computers. We identify per-device security behaviors: what software they are running (e.g., antivirus products, password managers, etc.), how is the software patched, and what is their network usage (e.g., does the machine contact file sharing sites), etc., as well as concrete security outcomes (i.e., whether a particular machine becomes compromised). In the course of this work, we describe three primary contributions:

A large-scale passive feature collection: we develop and test a large dictionary of classification rules to infer software state on monitored machines (e.g., that a machine is using an antivirus of a particular brand).
An outcome-based analysis: we show how to use a concrete evidence of security outcomes (operational security logs and network intrusion detection alerts) to identify the subset of machines in our dataset that are truly compromised (and not merely exhibiting “risky” behavior).
Prevalence and impact of security practices: for our user population, we establish the prevalence of a range of popular security practices as well as how these behaviors relate to security outcomes. We specifically explore the hypotheses that a range of existing “best practices” and “bad behaviors” are correlated with host compromise.

Using this approach, we identify a number of “bad behaviors” that are positively correlated with host compromise, but find few “best practices” exhibiting strong negative correlations that would support their clear value in improving end user security.

2. Background

Ours is far from the first research to empirically explore the security risks associated with user behavior. Although space does not allow a full exploration of related work, we highlight representative examples of past major efforts here.

Among the earliest of these studies is the work of Carlinet et al. which also used passive network analysis (albeit at a much smaller scale) to relate machine characteristics (e.g., such as operating system type) to security alerts. More recently, other researchers have specifically investigated how a user’s Web browsing habits reveal risk factors, notably Canali et al.’s⁴ study of antivirus telemetry (100,000 users) and Sharif et al.’s²² analysis of 20,000 mobile users. Both found that frequent, nighttime, and weekend browsing activity are correlated with security risk.

Another important vein of research has correlated poor software update habits with indicators of host compromise. Kahn et al.¹³ used passive monitoring to demonstrate a positive correlation between infection indicators and lack of regular updating practice over a population of 5000 hosts. At a larger scale, Bilge et al.³ used antivirus logs and telemetry from over 600,000 enterprise hosts to retrospectively relate such software updating practices to subsequent infections.

Finally, there is an extensive literature on the human factors issues involved in relating security advice to users, the extent to which the advice leads to changes in behaviors, and how such effects are driven by both individual self-confidence and cultural norms.^{8, 18, 19, 21, 23}

3. Methodology

Our measurement methodology uses passive network traffic monitoring to infer the security and behavioral practices of devices within a university residential network. In this section, we first focus on the technical aspects of our data collection methodology and then discuss some of its attendant challenges and limitations.

3.1. Network traffic processing

The first stage of our system takes as input 4–6 Gbps of raw bidirectional network traffic from the campus residential network, and outputs logs of processed network events at the rate of millions of records per second. As part of this stage, campus IP addresses are anonymized and, to track the contemporaneous mapping of IP addresses to a device’s MAC addresses, this stage also collects and compatibly anonymizes contemporaneous dynamic host configuration protocol (DHCP) syslog traffic.

3.1.1. Residential network traffic

As shown in the network traffic processing stage of Figure 1, our server receives network traffic mirrored from a campus Arista switch using two 10G fiber optic links. In addition to load balancing, the switch filters out high-volume traffic from popular content delivery networks (CDNs) (e.g., Netflix, YouTube, Akamai, etc.), resulting in a load of 4–6 Gbps of traffic on our server.

Figure 1. System architecture. Network traffic is first processed into logs and its addresses are anonymized. The next stage replays the network traffic logs to extract further information and label each connection with (also anonymized) MAC address information. The decorated logs are then stored in Hive where they are labeled with security incidents, security practice features, and behavioral features. Lastly, device models are created for analysis.

Although intrusion detection systems (IDSes) are typically used for detecting threats and anomalous network behavior, we use Zeek to convert network traffic into logs, because it is extensible, discards raw network traffic as soon as a connection is closed (or after a timeout), and is able to parse numerous network protocols. We also customize the Bro output logs to record only the information needed to identify security practice and behavioral features.

Every 30 minutes Bro rotates the previous logs through an address anonymization filter that encrypts the campus IP addresses. At this stage of processing, the logs contain IP addresses and not MAC addresses because the DHCP traffic is not propagated to our network vantage point. After being so anonymized, the logs are rotated across the DMZ to another server for further processing (Section 3.2).

3.1.2. DHCP traffic

The server also runs a syslog collector that receives forwarded DHCP traffic from the residential network’s DHCP servers. DHCP dynamically provides an IP address to a device joining the network. The IP address is leased to the device (by MAC address) for a specified duration, typically 15 minutes. Since we need to track a device’s security and behavioral practices for longtime periods, we utilize this IP-to-MAC mapping in later processing.

Similar to the Bro IDS logs, every 30 minutes we process the previous DHCP traffic into a (MAC address, IP address, starting time, lease duration) tuple. Then, the entire IP address and the identified lower 24-bits of the MAC address are encrypted using a similar address anonymization filter. The anonymized DHCP logs are then rotated across the DMZ to the Log Decoration server.

3.2. Log Decoration

The second stage takes as input these intermediate network events and DHCP logs, and processes them further to produce a single stream of network events associated with a (anonymized) device’s MAC addresses and domain names.

Associating flows to devices. Our goal is to model device behavior based upon network activity over longtime spans. Although we identify unique devices based upon their MAC address, the network events that we collect have dynamically assigned IP addresses. As a result, we build a dynamic IP address assignment cache to map IP-based network events to a specific device’s MAC addresses.

Associating flows to domains. When using network activity to model a device’s behavior, it useful to know the domain name associated with the end points devices that are communicating with (e.g., categorizing the type of Website being visited). We also extract the registered domain and top-level domain (TLD) from each fully qualified domain name using the Public Suffix List.¹⁵ Again, because the network events we observe use IP addresses, we must map IP addresses to domain names. And because the mapping of domain name system (DNS) names to IP addresses also changes over time, we also dynamically track DNS resolutions as observed in the network to map network events to the domain names involved.

User agent. We parse HTTP user agent strings using the open-source ua-parser library. From the user agent string, we extract browser, operating system (OS), and device information when present.

3.3. Feature extraction

In the final stage of our system, we store the log events in a Hive database and process them to extract a wide variety of software and network activity features associated with the devices and their activities as seen on our network. The last critical feature is device outcomes: knowing when a device has become compromised. We derive device outcomes from a log of alerts from a campus IDS appliance, and also store that information in our database.

3.3.1. Software features

To identify the features describing application use on devices, we crafted custom network traffic signatures to identify application use (e.g., a particular peer-to-peer client) as well as various kinds of application behavior (e.g., a software update). To create our network signatures, we use virtual machines instrumented with Wireshark. We then manually exercise various applications and monitor the machine’s network behavior to derive a unique signature for each application. Fortunately most applications associated with security risk frequently reveal their presence when checking for updates. In total, we develop network signatures for 68 different applications, including OSs. For a subset of applications, we are also able to detect the application’s version. Knowing application versions allows us to compare how fine-grained recommended security practices (i.e., updating regularly) correlate with device compromise.

Antivirus software. Using an antivirus software is virtually always recommended. We created network signatures for 12 popular antivirus products, seven of which were recognized as offering the “Best Protection” for 2019.¹⁶

Operating system. We created six signatures to identify the OSes running on devices. As regular OS updating is a popular recommended security practice, we also created signatures to detect OS updates. Although Windows and Mac OS operating system updates are downloaded over a content delivery network (CDN) that is removed from the network traffic before reaching our system (Section 3.1), we can use the OS version information from the host header and User-Agent string provided in HTTP traffic to infer that updates have taken place.

Applications. Through a combination of network and User-Agent string signatures, we detect 41 applications, including those commonly perceived as risky such as Adobe Flash Player, Adobe Reader, Java, Tor, peer-to-peer (P2P) applications, and more. We also detect other popular applications, including browsers, Spotify, iTunes, Outlook, Adobe AIR, etc.

Password managers. As password managers are frequently recommended to avoid collateral damage of leaked passwords, we also crafted network signatures for nine popular password managers.⁵

3.3.2. Network activity

We track a wide variety of network activity features to quantitatively measure the protocols used (e.g., HTTP, and HTTPS), the categories of sites visited (e.g., file sharing services), when devices are most active, etc. In doing so, we implement a set of features similar to those used by Canali et al.⁴ and Sharif et al.²² that focused on the Web browsing activity. As our dataset also includes traffic beyond HTTP, we can measure additional behaviors (e.g., remote DNS resolver usage, HTTPS traffic usage, etc.).

Content categorization. We use the IAB Tech Lab Content Taxonomy to categorize every registered domain in our dataset.¹² The domain categorization was generously provided by Webshrinker.²⁵ The IAB taxonomy includes 404 distinct domain categories.²⁴ We use the domain categorization to measure the fraction of unique domains each device accesses in a specific category. We also built a list of file hosting sites, and URL shortening services that we use to identify when a device accesses these types of services.

Usage patterns. We also develop a number of behavioral features that describe the quantities of HTTP and HTTPS traffic in each TLDs, and the number of network requests made. Additionally, we develop features that quantify customized or nonstandard behaviors such as the use of remote DNS resolvers, and the proportions of HTTP requests made directly to IP addresses (instead of a domain name).

3.3.3. Detecting security incidents

To identify compromised devices (i.e., ones with a security incident), we use alerts generated by a campus network appliance running the Suricata IDS. The campus security system uses deep packet inspection with an industry-standard malware rule set to flag devices exhibiting the post-compromise behavior.¹⁷

3.4. Ethical considerations and limitations

Having described our measurement methodology in considerable detail, we now consider the risks it presents—both to the privacy of network users and to the validity of conclusions drawn from these measurements.

Protecting user privacy. Foremost among the risks associated with the passive measurement approach is privacy. Even with the prevalence of encrypted connections (e.g., via TLS), processing raw network data is highly sensitive. From an ethical standpoint, the potential benefits of our research must be weighed against the potential harms from any privacy violations. In engaging with this question—and developing controls for privacy risk—we involved a broad range of independent campus entities such as our institutional review board (IRB), the campus-wide cybersecurity governance committee, and our network operations and cybersecurity staff. Together, these organizations provided necessary approvals, direction, and guidance in how to best structure our experiment, and provided a strong support for the goals of our research. The campus security group has been particularly interested in using our measurements to gain insight into the security risks of devices operating on their network; indeed, during the course of our work, we have been able to report a variety of unexpected and suspicious activities to campus for further action.

Operationally, we address privacy issues through minimization, anonymization, and careful control over data. First, as soon as each connection has been processed, we discard the raw content and log only metadata from the connection (e.g., a feature indicating that device X is updating antivirus product Y). Thus, the vast majority of data is never stored. Next, for those features we do collect, we anonymize the campus IP and the last 24-bits of each MAC address, using a keyed format-preserving encryption scheme.² Thus, we cannot easily determine the identity of which machine generated a given feature and, as a matter of policy, we do not engage in any queries to attempt to make such determinations via reidentification. Finally, we use a combination of physical and network security controls to restrict access to both monitoring capabilities and feature data to help ensure that outside parties, not bound by our policies, are unable to access the data or our collection infrastructure. Thus, the server processing raw network streams is located in a secure campus machine room with restricted physical access, only accepts communications from a small static set of dedicated campus machines, and requires multi-factor authentication for any logins. Moreover, its activity is itself logged and monitored for any anomalous accesses. We use similar mechanisms to protect the processed and anonymized feature data although these servers are located in our local machine room. The feature dataset is only accessible to the members of our group, subject to IRB and campus agreements, and will not (and cannot) be shared further.

Limitations of our approach. In addition to privacy risk, it is important to document the implicit limitations of our study arising from its focus on a residential campus population—primarily undergraduates—as well as the use of a particular IDS and rule set to detect security incidents.

It is entirely possible that the behavioral modes of this population, particularly with respect to security, are distinct from the older, less affluent or more professional cohorts. This population bias is also likely to impact time-of-day effects, as well as the kinds of hardware and software used. Additionally, the security incidents we consider rely on the Suricata IDS, commercial network traffic signatures, and security-related network usage requirements of our university environment (e.g., residential students are nominally required to have antivirus software installed on their devices before connecting). It is entirely possible that these incident detection biases also influence the behaviors and software applications that correlate with device compromise. Thus, if our same methodology employed in other kinds of networks, serving other populations, or using different security incident detection techniques, it is possible that the results may differ. For this reason, we hope to see our measurements replicated in other environments.

4. Dataset

We analyze six months of data from our passive network traffic processing system from June 2018 to December 2018. In this section, we describe our approach for identifying the laptop and desktop devices for use in analyzing security risk factors, and determining the dominant OS of devices used in our analysis. In the end, our dataset consists of 15,291 devices and Table 1 characterizes our dataset in terms of its traffic.

Table 1. Dataset characterization. Note that our network vantage point provides DNS requests from the local resolver, which includes DNS traffic from devices in this paper as well as other devices using the university’s networks.

4.1. Device filtering

The university allows heterogeneous devices on its network, such as personal computers, mobile phones, printers, Internet of Things (IoT) devices, and more. Recommended security practices, however, are commonly offered for laptop and desktop computers, and therefore, we focus our analysis solely on such devices. As a result, we develop techniques to identify laptop and desktop computers among the many other devices on the network. We remove devices that are easily identifiable, and then develop heuristics to filter remaining devices. We remove devices that are not active for a minimum of 14 days, which never provide a major Web browser’s User-Agent string, which consistently show a User-Agent string as having a mobile or IoT OS, and devices whose organizationally unique identifier (OUI) in their MAC address match an IoT vendor.

4.2. Identifying dominant OSs

Since different OSs have different risk profiles, identifying the OS used by a device is an important step. Being able to observe a device’s network traffic makes OS identification an interesting task. A majority of devices are straightforward: using the signatures of OS update events, we can immediately identify a single unambiguous OS for 79.1% of devices.

The remaining devices either have no OS update signatures, or have more than one. For these devices, we use a combination of OS update signatures, OS User-Agent strings, and organizational unique identifier (OUI) vendor name information to identify the dominant OS of a device (e.g., the host OS with virtual machines, Windows if tethering an iPhone, etc.). We assume that devices with an Apple OUI vendor name will be using Mac OS (7.2%). We then use the dominant OS extracted from User-Agent strings to assign an OS (11.5%). The remaining 340 devices (2.1%) have both Windows and Mac OS updates. We choose to assign Windows as the dominant OS in these cases because of strong evidence of device tethering.¹ For each heuristic, we confirmed the labeling by manually checking the traffic profile of a random sample of devices.

5. Recommended Practices

There are a variety of security practices widely recommended by experts to help users become safer online. Prior work has explored some of these practices in terms of users being exposed to risky Websites.^4,22 Since our data includes actual security outcomes, we start our evaluation by exploring the correlation of various security practices to actual device compromises in our user population: operating system choice, keeping software up to date, Websites visited, network use, antivirus use, and software used.

5.1. Operating system

Different operating systems have different security reputations, so it is not surprising that experts have recommendations of the form “Use an uncommon OS”.²⁰ Part of the underlying reasoning is that attackers will spend their efforts targeting devices with most common systems, so using an uncommon operating system makes that device less of a target.

In terms of device compromise, as with previous work and experience, such advice holds for our user population as well. Using the OS classification method described in Section 4.2, Figure 2 shows the number of devices using major operating systems and the number of each that was compromised during our measurement period. Most devices use Windows and Mac OS, split nearly equally between the two. The baseline compromise rate among the devices is 4.5%, but Windows devices are 3.9X more likely to be compromised than the Mac OS devices. The Chrome OS population is small, and only one such device was compromised.

Figure 2. Device OS classification after removing IoT and mobile devices: the total number of devices with each OS and the number with a security incident.

Of course, with modulo dual-booting or using virtual machines, this kind of advice is only actionable to users when choosing a device to use, and is no help once a user is already using a system.

5.2. Update software

Among the hundreds of security experts surveyed, by far the most popular advice is to “Keep systems and software up to date”.²⁰ In this part, we explore the operating system, browser, and Flash update characteristics of the devices in our population, and how they correlate with device compromise.

5.2.1. Operating system

Mac OS. We start by analyzing the update behavior of devices running Mac OS. We see that 7268 (47.5%) devices are identified as Mac and are never absent from the network for more than three days. Of these, we see at least one update for 2113 of them (29.1% of all Mac OS devices). Figure 3 shows the update pattern of these Mac OS devices over time, anchored around the three OS updates released by Apple during our measurement period. In general, Mac OS users are relatively slow to update, anecdotally because of the interruptions and risks Mac OS updates entail.

Figure 3. Number of days a Mac OS device takes to update to a specific version. The version number on the x-axis denotes the day the specified version update was published.

Of these devices, 57 (2.7%) of them were compromised. Compromised devices have a mean and median update rate of 16.2 and 14.0 days, respectively, whereas their clean counterparts have a mean and median update rate of 18.0 and 16.0 days. However, this difference is not statistically significant according to the Mann-Whitney U test (p = 0.13).

Windows. For Windows, we developed a signature to extract the knowledge base (KB) number of “Other Software” updates (for example, Adobe Flash Player, and so forth). Our signature detects when a device downloads the update, and we identify the update’s release day using a Microsoft’s Update Catalog service.¹⁴

Across devices running Windows, we see at least one update for 6459 of them (84% of all Windows devices). Based upon the averages and medians, devices update with similar deltas (2.5 days and 0 days, respectively) regardless of whether they have a security incident. In short, the update behavior of compromised Windows devices is little different than that of clean devices.

5.2.2. Web browser

Browsers are large, complex pieces of software used on a daily basis and, as with most software, these large programs have vulnerabilities. As such, we explore the relationship between compromised and clean devices and browser updating behaviors. Similar to the Mac OS devices, we are able to detect the current browser version number from the User-Agent string of a device. Moreover, we only analyze the dominant browser for each device. While users may use different browsers for different use cases, we identify a dominant browser to remove the noise from user applications that spoof a browser in their User-Agent string.

As browser vendors publish the dates when they make updates available, we can check whether the browser on a device is out of date each time we see the device on the network. Across the measurement period, we then calculate how quickly devices update. We analyzed updates for devices that dominantly use Chrome, Edge, Firefox, and Safari. For devices that are on the network continuously (absent for less than three consecutive days), Table 2 shows the browsers with statistically significant differences in update time between clean and compromised devices (Mann Whitney U: Chrome p = 4.2 × 10⁴ and Firefox p = 0.03).

Table 2. Number of days between when an update is published and when devices update. Compromised devices update faster than their clean counterparts across their lifetimes.

Surprisingly, clean devices appear to spend more time out of date than their compromised counterparts. Examining in more detail, we compare the update behavior of compromised devices before and after their compromise date. We focus on devices using Chrome that have two updates spanning the compromise event (other browsers do not have a sufficiently large sample size). Figure 4 shows the distribution of times devices were out of date with respect to when a browser update was released for updates before and after the device was compromised. The shift in distributions illustrates that devices update faster after compromise: devices that use Chrome have a before-compromise mean update rate of 18.9 days (18.0 days median) and an after-compromise mean update rate of 14.2 days (15.0 days median). This difference is significant with p = 4.8 × 10⁻¹² using the Wilcoxon signed-rank test.

Figure 4. Distribution of days a device takes to update Chrome before compromise and after compromise.

5.2.3. Flash Player

The Adobe Flash Player has long been associated with security risk and device compromise. The typical recommendation is to not use Flash at all, but if you do, to keep it up to date. We created a signature to detect Adobe Flash Player on Windows devices. We focused on the desktop version of Flash as major browser vendors issue Flash plugin updates directly. Adobe released six updates within our measurement period, and we used Adobe’s Website to identify the version and release date for each.

Somewhat surprisingly, the desktop Flash is still quite prevalent on devices. A total of 2167 devices (28% of Windows devices) are checked for a Flash Player update, of which 1851 are seen downloading an update. Table 3 shows the average, median, P90, P95, P99, and variance of the number of days between when an update is downloaded and when it is released. The rate of compromise of devices that update Flash is 8.1%, which is only slightly higher than the rate of all Windows devices (7.9%) (Chi-Square p = 0.057). Among the 316 devices that we detect Flash Player on, but do not see updates, only 15 are compromised (4.8%). We interpret these results as a community success story. A combination of widespread awareness, aggressive updates, and focused attention has mitigated the desktop Flash as a significant risk factor.

Table 3. Flash Player updates on Windows devices.

Curiously, compromised devices updated Flash slightly faster than clean devices (Mann-Whitney U test p = 0.025). We hypothesized that a compromised device’s update behavior will change after being compromised, so we compared the update patterns for compromised devices before and after becoming compromised. Out of the 149 compromised devices that update Flash, there are 60 devices (40.3%) with updates before and after their first incident. The median and average days compromised devices take to update before an incident are 6.5 and 9.9, respectively, and 0 and 1 days after becoming compromised (Wilcoxon signed-rank test p = 1.73 × 10⁻⁷). As with Chrome browser update behavior, these results suggest that shortly after a security incident, devices exhibit better Flash update hygiene.

5.3. Visit reputable websites

Experts recommend users to be careful in the websites that they visit (“visit reputable websites”²⁰), and indeed prior work has found that the category of websites users visit can be indicative of exposure to risky sites.^{4, 22} We perform a similar analysis for devices that are actually compromised, and for the most part confirm the types of sites that lead to exposure to risky sites also correlate with actual compromise.

To categorize the content devices access, we use the IAB domain taxonomy (Section 3.3.2). We use the Kolmogorov-Smirnov (KS) test with Bonferroni correction to compare the ECDFs of the fraction of distinct registered domains in each category that clean and compromised devices access, and confirm that they are statistically significant (i.e., p < 0.001).

Table 4 shows the most substantial differences between the types of content accessed, for example, with clean devices accessing more business, advertising, and marketing content, compromised devices accessed more gaming, hobby, uncategorized, and illegal. We note that although the previous work found that exposed devices visit more advertising domains,²² our finding of the opposite behavior can be explained by differences in methodology. The previous finding used solely HTTP requests generated by static content, whereas our network traces included all HTTP requests (including those generated by JavaScript) as well as HTTPS traffic.

Table 4. Types of content accessed more by clean or compromised devices. We show the median fraction of registered domains accessed in the category for clean (Cln.) and compromised (Cmp.) devices, and delta in median.

5.4 Network use

One trend is simply that compromised devices generate more Web traffic than clean devices. Figure 5 shows the distributions of average weekly device Web activity for clean and compromised devices. For every device, we count the number of fully qualified domains the device visits via HTTP and HTTPS combined per week, and normalize by averaging across all weeks that the device was active. Each bar in the histogram counts the number of devices that visit a given number of FQDNs per week, with 100-domain bins. The distribution for compromised devices is clearly shifted toward visiting more sites per week (and other traffic granularities show similar behavior). We interpret this result as just reflecting that more activity correlates to greater exposure and risk (much like automobile accidents).

Figure 5. Distributions of average weekly device Web activity for clean and compromised devices.

5.5. Use antivirus

Using antivirus software is a nearly universal recommendation and, indeed, residential students on our campus are nominally required to have antivirus software installed on their devices to use the network. We crafted signatures to detect network activity for over a dozen antivirus products. Focusing on Windows devices, a larger percentage (7%) of devices with antivirus are compromised compared to devices that do not have it (4%). By definition, though, most compromised devices in our population are those that were compromised by malware that antivirus did not catch.

5.6. Software use

As discussed in Section 3.3, we extract a wide variety of features about the software used on devices observed on the network. We now explore how these software features correlate with a device being compromised. Since compromise depends on the operating system being used (Windows devices are compromised more often than Mac OS devices), we also explore software features not only in the context of all devices but also individual operating systems.

For each correlated software feature, Table 5 shows the device population, fraction of compromised devices with the feature, and fraction of compromised devices without the feature. These results provide direct comparisons on the compromise rates between the devices with a particular software feature and without: for example, devices using Tor are compromised 2 to 3.5X more often than devices that do not. To ensure that the comparisons are statistically significant, we use the Chi-Square test with Bonferroni correction because these are binary categorical features, and the very low p-values shown in Table 5 confirm their significance.

Table 5. Software features across device populations correlated with compromise. Each feature shows the number of devices with the feature, p-value from the Chi-Square test, fraction of compromised devices with and without the feature. Compromise rates: All devices 4.5%, Windows devices 7.0%, and Mac OS devices 1.9%.

Devices using some specific applications correlate very strongly with compromise, independent of operating system and network activity. Devices using Adobe AIR, P2P file sharing networks, Thunderbird, and Tor on average are much more likely to be compromised than devices that do not use such applications. Using these applications does indeed put devices at significantly more risk. The Thunderbird email client is particularly ironic; the one reason why people use Thunderbird is because of its PGP integration;⁷ yet, Thunderbird is rife with reported vulnerabilities (420 code execution vulnerabilities reported in CVE Details⁶).

6. Ranking Feature Importance

Our analyses so far have focused on individual security practices. As a final step, we explore the relative importance of all the features we extract using statistical modeling, as well as the relative importance of features exhibited during the hour before a device is compromised. Our goal is not to train a general security incident classifier, but instead to generate a model for ranking the relative importance of our features.

6.1. Experimental setup

Logistic regression is a statistical technique for predicting a binary response variable using explanatory variables.¹¹ We set the response variable to be whether or not a device is compromised, and use all of the device features we extract from the network as the explanatory variables. We first split the data into training (50%) and test (50%), and normalize the explanatory variables to have zero mean and unit variance.

To find the important explanatory variables, we use L1 logistic regression because we have a high number of explanatory variables. To find the optimal regularization parameter, we implement hyperparameter tuning: we build 200 models, each with a different regularization parameter, and identify the model that performs best. To identify the best model while avoiding selection bias, for each model, we perform a 10-fold cross validation.

To compare the importance of each feature, we implement a greedy deletion algorithm.⁹ We start with the N important features used to predict security incidents identified by the best model (previous paragraph). For N – 1 feature combinations, we train regularized models with hyperparameter tuning. From the resulting models, we identify the model that has the maximum area under curve (AUC) (when predicting on validation data), and exclude the unused feature in the next iteration of the algorithm as it contributes least to the overall AUC compared to the other feature combinations. We repeat this process until we have a model that uses a single feature (N = 1); the remaining feature contributes the most to the AUC by itself and in the presence of other features. Finally, we interpret the results in terms of the changes to the test AUC when features are added to the final model.

6.2. All features

We run the greedy deletion algorithm multiple times with different device groupings: all devices, Windows devices, Mac OS devices, and devices with on-median more HTTP traffic. We consider devices that produce on-median more HTTP traffic based on our observations in Section 5.4. Table 6 shows the top four features for each grouping, the feature’s AUC contribution when predicting on validation and test data, and the ratio of the feature’s median (continuous) or mean (categorical) value for compromised and clean devices. Because we select the feature combination with the highest validation AUC, it is possible that adding in an extra feature will result in a small negative contribution to the test AUC (e.g., the “HTTP Traffic at 2AM” feature for Mac OS devices).

Table 6. AUC gains from the top four features used to detect devices with security incidents, as well as the ratio of median (continuous) or mean (categorical) values. Ratios > 1 (green) indicate compromised devices exhibit more of the feature.

Our results indicate that behavioral features, regardless of device grouping, are most correlated with device compromise. In all cases, the first feature in each grouping relates to how much a device accesses a Web content or the type of content being accessed. Having Windows antivirus products (a proxy for using Windows, which has a significantly higher compromise rate), or using P2P applications is the only two software features in the top four of any grouping. Having the IE User Agent feature highly ranked highlights the challenge of cursory feature extraction. Applications can make use of embedded browsers, and examining traffic with an IE User Agent string shows many of the detections are actually from the QQ chat application and Qihoo 360 security product, not the IE browser. We also find that compromised devices, in the majority of cases (except for two features within the Mac OS grouping), exhibit more of each feature compared to clean devices.

6.3. One hour before compromise

Lastly, we use our statistical model to examine the relative importance of security features focusing on the hour leading up to device compromise: compared to devices that are not compromised, how are compromised devices behaving differently leading up to becoming compromised? For each compromised device, we extract their features from the hour before their first incident. To compare differences in behavior, we construct a synthetic control by taking a pseudorandom sample of clean devices. Specifically, for each compromised device, we randomly select up to 300 clean devices that are (1) active in the same hour window and (2) visit at least 50 distinct registered domains.

Table 7 shows the most important features (relative to one another) for identifying the compromised devices an hour before they are compromised. For our devices, the type of Websites visited (Section 5.3) are the most distinguishing features. On average, compromised devices visit more Websites in each of the eight categories in Table 7 than the clean devices. The most popular domains our devices visit in these categories do correspond well to the category domains. For some of the very generic labels, “Computer Games” are gaming sites; “Computer Networking” include ISPs and IP geolocation services; “Internet Technology” include SSL certificate sites and registrars, etc.

Table 7. AUC gains for the top eight features for detecting devices with security incidents one hour before compromise.

7. Conclusion

The practice of cybersecurity implicitly relies on the assumptions that users act “securely” and that our security advice to them is well-founded. In this paper, we have sought to ground both assumptions empirically: measuring both the prevalence of key security “best practices” as well as the extent to which these behaviors (and others) relate to eventual security outcomes. We believe that such analysis is critical to making the practice of security a rigorous discipline and not simply an art.

However, achieving the goal of evidence-based security is every bit as formidable as delivering evidence-based healthcare has proven to be. In any complex system, the relationship between behaviors and outcomes can be subtle and ambiguous. For example, our results show that devices using the Tor anonymizing service are significantly more likely to be compromised. This is a factual result in our data. However, there are a number of potential explanations for why this relationship appears: Tor users could be more risk-seeking and expose themselves to attack, or they might be more targeted, or there might be vulnerabilities in Tor itself. Indeed, it is even possible that Tor use simply happens to correlate with the use of some other software package that is the true causal agent.

Thus, although some of our results seem likely to not only have explanatory power but also to generalize (e.g., the use of Thunderbird and Adobe AIR, both historically rife with vulnerabilities, has significant correlations with host compromise), others demand more study and in a broader range of populations (e.g., why are gamers more prone to compromise?). Those results that lack simple explanations are a reflection of the complexity of the task at hand. Having started down this path of inquiry, though, we are optimistic about answering these questions because we have shown that the methodological tools for investigating such phenomena are readily available. We look forward to a broader range of such research going forward as our community helps advance security decision-making from the “gut instinct” practice it is today, to one informed and improved by the collection of concrete evidence.

Acknowledgments

This work was supported in part by NSF grants CNS-1629973 and CNS-1705050, DHS grant AFRL-FA8750-18-2-0087, and the Irwin Mark and Joan Klein Jacobs Chair in Information and Computer Science.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Measuring Security Practices

View in the ACM Digital Library

This work is licensed under a http://creativecommons.org/licenses/by/4.0/

DOI

10.1145/3547133

September 2022 Issue

Published: September 1, 2022

Vol. 65 No. 9

Pages: 93-102

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

BLOG@CACM Jul 26 2024

Establishing Standards for Embodied AI

Shaoshan Liu

Architecture and Hardware

vitruvian man on green binary code background, illustration

BLOG@CACM Jul 24 2024

A Pioneer in Using AI to Teach Reading

Jeremy Roschelle

Architecture and Hardware

BLOG@CACM Jul 23 2024

A Versal Story in the Era of Hardware AI: Why the Chinese Could Win

Aleksandr Romanov and Maksim Popov

Architecture and Hardware

worker amidst rows of circuit boards at Chinese factory

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

1. Introduction

2. Background

3. Methodology

4. Dataset

5. Recommended Practices

6. Ranking Feature Importance

7. Conclusion

Acknowledgments

Measuring Security Practices

DOI

September 2022 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.