Sign In

Communications of the ACM

Communications of the ACM

Enhancing Efficiency in the Health Care Industry

View as: Print Mobile App ACM Digital Library Full Text (PDF) Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook

Organizations across industry sectors have intensified their initiatives to increase operational efficiency through effective resource allocation, and the health care sector is no exception. The health care industry has been faced with a number of additional factors that have increased the complexity of managing available resources. Some of these factors include the introduction of new organizations such as HMOs and PPOs, an increase in the aging population, the rising costs of defensive medicines, and the challenges of optimizing existing health care facility usage (such as staffing doctors and nurses along with monitoring designated bed utilization rates).

One way to enhance operational efficiency in this sector is by more accurately identifying the sources of future high resource demand and initiating strategic management tactics to mitigate the potentially significant costs of fully developed illnesses [6]. Quantitatively based analytic techniques have been utilized to help increase operational efficiencies by enhancing the decision making process in medical treatment procedures [4]. More specifically, they enable decision makers to identify patterns in clinical, claims, and activity-based historical data in order to create models to more accurately predict future resource demand. Because it decreases the treatment process variability typically found in health care, increased predictive capability ultimately helps reduce inefficient allocation of health service resources. The purpose of this research is to illustrate the benefits of incorporating predictive modeling to more accurately identify patients likely to develop chronic illness.

American Healthways Corporation's predictive analysis addresses a common scenario in the health care industry: developing programs to reduce costs associated with the top 20% of its future high cost members. Such members are likely to develop a chronic illness (which is diabetes in this study), and exhibit high utilization levels of health care service resources. AMHC's Total Population Predictive Model incorporated a neural network methodology to analyze claims-based data in a health plan population to predict future high-risk members. This methodology achieved positive results.

At a threshold level of 20% of true high cost members, the AMHC predictive model correctly identifies 57% of the true future high cost members, while an alternate, prior cost model, correctly identifies only 34% of true future high cost members. This figure depicts the enhanced predictive accuracy of the neural net approach over that of pure chance and the prior cost approach.

The average cost of the top 20% of high cost members is $7,225. The difference in the total number of correctly identified high cost members between the neural net model and the prior cost model is 2,236 members (5,6213,385). Consequently, the total cost associated with this difference in accuracy is over $16 million ($7,225 X 2,236). This difference in savings potential between the two models represents a significant opportunity for the health plan.

Back to Top

Conceptual Issues For Managing Health Care Costs

Typically, a relatively small percentage of a health plan members account for a disproportionately large percentage of total health care costs [7]. Consequently, there is tremendous value to be derived from correctly identifying high-risk patients before their health status begins to significantly degrade, and offering them the appropriate behavioral or clinical interventions.

In order to identify those patients who would most benefit from disease management and educational efforts, many health plans "risk stratify," or classify members who suffer from chronic disease conditions [3]. For example, a health plan might stratify their diabetes members into three groups based on estimated current and/or future risk (high, medium, and low risk). In most cases, insurers rely on rules-based risk stratification models to classify members within a given disease population.

There are limitations to rules-based stratification algorithms, however, that diminish their accuracy with respect to predicting future risk. The first is that rules-based methods are fairly subjective, as there exist no widely accepted clinical standards for rating disease specific or total population risk. Second, rules-based algorithms are frequently created from a loose combination of historical experience and principles taken from the medical literature, rather than being systematically derived and validated through inclusive, empirical methods. These limitations imply that the methods tend to both lack reasonable predictive power, and exhibit regression to the mean with repeated use [1].

From a practical standpoint, rules-based models frequently fail to identify those patients who are at highest risk, leading plan managers to inefficiently allocate scarce intervention resources to patients who do not need them. The net result is that both medical outcomes for high-risk patients, as well as plan total expenditures, are negatively affected.

AMHC uses an empirical claims-based approach to predictive modeling and risk stratification with the purpose of creating a model that correctly identifies future high-risk patients based on information derived from a health plan's claims history. It collects and processes all relevant medical claims, pharmacy, lab result, and clinical data to develop, calibrate, and implement predictive modeling risk factors for an entire population. Usually, 50 to 75 factors are created from these sources to predict high-risk patients.

The independent variable risk factors are based on epidemiological and clinical experience with chronic disease conditions, as well as administrative experience with a wide range of commercial health plans. Typically, a health plan provides two or three years worth of claims data from which these risk factors are extracted. After a full set of risk factors has been extracted from client claims data, AMHC utilizes a wide array of artificial intelligence-based, statistical and econometric methods to investigate the data and create a comprehensive predictive model. In general, a study methodology follows these basic steps:

  1. Define a study population of continuous enrollment for nine consecutive quarters of plan membership.
  2. Extract from the data set the total population risk factors from Year One, and outcomes from Year Two.
  3. Train and develop a neural net model that predicts Year Two outcomes as a function of Year One factors.
  4. Apply the final tested and cross-validated neural net model to predict the Year Three high cost members.
  5. Compare actual Year Three outcomes to Year Three predictions.
  6. Revalidate the model against prior period results to ensure robustness.

This study was a retrospective cohort analysis of members of a large commercial health plan located in the southeastern U.S. Two years of claims data was used to develop and calibrate the model, using split-sample validation to ensure the reliability (that is, repeatability) as well as the validity of this predictive model. Cost, rather than utilization, was used as the dependent measure because it is often more sensitive to the overall severity associated with medical services. Predictors (independent variables selected as a subset of all risk factors extracted) were selected based on their predictive probabilities using decision trees. A "selection tree" (a decision tree used for variable selection) was used to identify variables with high predictive probabilities for inclusion in the neural net model. Variables were retained based on a Chi-Square Test with a significance level of 0.2. Those variables with the highest predictive probabilities were selected for inclusion in the neural net predictive model. Ultimately 14 variables were included in the final model.

A third year of claims data was used to assess the accuracy of this model. Professional, pharmacy, and institutional claims for continuously enrolled members in this health plan's HMO and point of service (POS) products were provided. Professional and pharmacy data was available on the claim level when institutional claims were reported as a complete episode of care. A total of 54,206 health plan members from 3 to 71 years of age met the continuous Year One to Year Two enrollment criteria required for model development and calibration. In Year Three, only 46,141 of the 54,206 continuously enrolled members actually incurred claims. Input variable names used in the neural net predictive model are included below. The average age for these members was 34 years, with a gender distribution of 51.2% females and 48.8% males. Appproximately 4% of this population was diagnosed with diabetes (see Table 1 for details).

The neural network used in this study included 14 input variables, one hidden layer, three hidden units, and one target variable. The model architecture was a multilayer perceptron, using backward propagation with randomized target bias weights. Results of the model were evaluated using the following steps:

  1. Receiver Operating Characteristics (ROC) and Sensitivity Curves were generated on a Year One to Year Two cross-validation (withheld) data set.
  2. The model was applied to all individuals in the Year Two data in order to predict membership in the Year Three high cost class.
  3. All individuals in the Year Two data set received a Predictive Risk Score for Year Three.
  4. All Year Two members were rank ordered from highest to lowest, based on their predicted risk score for a Year Three outcome.
  5. Actual Year Three results were compared to the results obtained from the predictive model.

Year One to Year Two model results achieved approximately an 81% accuracy rate in categorizing patients as being high versus low cost using Year One data to predict Year Two cost. A sensitivity analysis of Year Two data depicts that a screening threshold of 10% is associated with a 42% true capture rate. In other words, an expected true capture rate of 42% of Year Two future high cost patients would be expected from intervening with only the top 10% of patients predicted by this model as being future high cost. Likewise, screening thresholds of 25%, 50%, and 80% are associated with true capture rates of 66%, 85%, and 97%, respectively (see Table 2).

Year Two to Year Three predictive results accurately identified approximately 83% of future high cost patients using Year Two data to predict Year Three cost, which was slightly better than the Year One to Year Two results. Year Three data also depicted a high true positive capture rate, where screening thresholds of 10%, 25%, 50%, and 80% are associated with true capture rates of 41%, 66%, 86%, and 97%, respectively (see Table 2).

In general, the neural network predictive model produced very similar results, in terms of number of true positive captures for Year Two as well as Year Three data. That is, the model trained on Year One to Year Two data demonstrated comparable results for Year Two and Year Three. This result demonstrates the overall high external validity and repeatability (reliability) of the predictive model.

Back to Top

Enhancing Efficiency In Health Services

This study demonstrates that neural net predictive models accurately identify a large percentage of patients at high risk for future medical cost (approximately 79% to 84% probability of correctly identifying true future high-risk patients). For example, by intervening with only the top 10% of patients predicted by the model as being future high-risk, one would expect to capture at least 42% of the future true high-risk patients, and intervening with 50% would capture at least 85% of the future true high-risk patients.

The value add of the enhanced predictive capabilities introduced in this study is ultimately achieved by utilizing the output of the analytic findings (the people more likely to experience fully developed diabetes) and apply appropriate preventative measures to mitigate the potential future outcome. In this case, the increased accuracy of identifying individuals likely to experience fully developed diabetes enables health care providers and institutional organizations to better target preventative measures to those that most need it. This can be achieved by more accurately applying procedural measures such as the National Standards of Care described by the American Diabetes Association, according to descriptive and demographic attributes of high-risk patients. For example, a study completed by the Diabetes Prevention Program [2] concluded that people with pre-diabetes can prevent the development of type-2 diabetes by adjusting diet, exercise, and behavioral activities. Ultimately, the aid of self-management through education and medical care from physicians, nurses, dieticians, and pharmacists, applying best practices and standards developed by research, can reduce the number of cases of fully developed diabetes and the costs associated with treating the disease.

Back to Top


It is possible to develop neural net models that identify considerably more future high-risk patients than many traditional rules-based or regression-based models [5]. The positive impact in the efficiency and cost effectiveness of care management intervention programs on at-risk health plan members is significant. Prospectively identifying high-risk patients and subsequently adjusting health care management interventions based on patients' risk stratification levels can result in improvements in the quality of care received by these patients, as well as positively impacting associated health care costs. The predictive model results reported here are based on a specific objective for a defined population at a given point in time. AMHC recognizes the value of predictive modeling and continues ongoing research with predictive models for positive health care outcomes through early identification and intervention.

Back to Top


1. Cousins, M., Shickle, L., and Bander, J. An introduction to predictive modeling for disease management risk stratification. Disease Management Journal 5 (2002), 157167.

2. DPP Research Group. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. New England Journal of Medicine 346, 6, (2002), 393403.

3. Grana, J., Preston, S., McDermott, P.D., and Hanchak, H.A. The use of administrative data to risk-stratify asthmatic patients. American Journal of Medical Quality 12, (2002), 113119.

4. Hazen, G. Preference factoring for stochastic trees. Management Science 46, 3 (2000), 389403.

5. Kiernan, M., Kraemer, H., Winkleby, M., King, A., and Taylor, C. Do logistic regression and signal detection identify different subgroups at risk?: Implications for the design of tailored interventions. Journal of Philosophy, Psychology and Scientific Methods 6, (2001), 3548.

6. McLaughlin, C., Yang, S., and Van Dierdonck, R. Professional service organizations and focus. Management Science 14, 7 (1995), 11851193.

7. Shelton, P. Disease management programs: The second generation. Disease Management and Health Outcomes 10, 8 (2002), 461467.

Back to Top


Stephan Kudyba ( is a faculty member in the Department of Management, New Jersey Institute of Technology, University Heights, CAB Building, Newark, NJ.

G. Brent Hamar ( is director of informatics at American Healthways, Inc., Nashville, TN.

William M. Gandy ( is senior director of informatics at American Healthways, Inc., Nashville, TN.

Back to Top


UF1Figure. Comparative analytic results.

Back to Top


T1Table 1. Model input factors.

T2Table 2. Meaningful threshold points for Year Two and Year Three sensitivity analyses.

Back to top

©2005 ACM  0001-0782/05/1200  $5.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2005 ACM, Inc.


No entries found