Over the past decade, the mountains of data accumulating within firms - expanding at an annual rate of 30-50% and fueled in part by recent legislation such as SEC Rule 17a-4 mandating the retention of electronic communications (for example, email and instant messaging) in financial services firms for periods of up to three years - has forced information technology (IT) executives to ask how should this data deluge be managed. Intel, whose data warehouse is currently estimated at over 30 petabytes has increased its data storage by an average of 35% annually with the expectation that, if current trends continue, its data center could expand to 165 petabytes by 2014. Yet even as data volumes climb, causing data centers to double in size every other year, an innovation-led drop in annual per-gigabyte storage costs of between 35-45% (see Figure 1) has failed to halt the rise in storage spending.2,8
With organizing and utilizing data now seen as one of the most critical issues facing firms worldwide,3 IT executives note that non-discretionary IT spending - 25% of which covers information management costs and storage infrastructure, in particular - has now reached a point where strategic IT initiatives are at risk.2 Hence, CIOs have a pressing need to understand the dynamics of information management costs and their ability to control these costs through carefully chosen policies governing data collection and retention. Knowing how these costs behave, CIOs can continue to support the data needs of their firms without limiting their IT goals.
To begin this process, we examine a tiered information framework that, by considering the value of information, allows CIOs to comprehend the interplay of market forces that shape information costs.7,12 Lastly, we review several challenges posed by our framework that future academic research can help to resolve.
Of the six cost categories seen in Table 1, information management costs are primarily shaped by two opposing forces - namely, better, faster, and cheaper technology which exerts downward pressure on costs, coupled with a tendency among firms to collect data on every facet of their business which leads to increased demand for storage capacity and an associated increase in costs. Since firms are unlikely to rip and replace their storage infrastructure each time a new technology innovation enters the marketplace, the dominant force behind the sudden rise in information management costs is the near-exponential growth in the volume of data that is being collected and held by firms. The problem is not simply the cost of storage infrastructure (disk arrays, device management software) but the labor and overhead costs associated with backup, recovery, and storage administration. The reality today is that the collection and retention of data is expanding at a faster rate than the ability of new technology innovation to neutralize or compensate for any demand-driven increase in storage costs. Firms are literally choking on their own data.
Even as firms struggle to accommodate new data, there remains the question of how to deal with existing data that are stored on legacy systems. In an analogous sense, just as car drivers are reluctant to replace their cars each year simply because of new airbag or safety technology, better fuel economy, or a nicer interior, there is nevertheless a point at which the high cost of maintaining a long-since depreciated car exceeds the capital depreciation costs on a new car. Consequently, IT managers need to recognize that information management costs arise not merely because of new data but also because of existing data and that in the latter case especially, there may come a time when migrating this data to new, better, faster and cheaper storage infrastructure becomes an increasingly attractive proposition. Knowing this, vendors have started to price new storage hardware so as to match the maintenance costs of older hardware. Firms may mistakenly see this as having a neutralizing effect on overall storage spending but this is not the case.
A further complicating factor in firms' efforts to manage information costs is the fact that not all data are created equal. For example, while a financial services firm could easily survive without access to its employee payroll data for 24 hours, the loss of key customer account data for any length of time could prove financially catastrophic. In such circumstances, an IT manager is less likely to favor a simple cost-minimization approach to ongoing data management, fearing that the impact of delayed access or systems failure could outweigh any promised cost savings from pursuing a low-cost infrastructure. Instead, service level parameters such as reliability, availability, security, redundancy, response times, maintainability and scalability are more likely to influence infrastructure utilization decisions, even if this higher service level should cause an increase in information management costs. If IT managers are required to maintain high service levels for data that must be readily accessible, the fear that "you get what you pay for" will force them to migrate data to newer and more reliable storage infrastructure sooner rather than later, even if the capital costs of the existing storage infrastructure are not yet fully depreciated.
In contrast, low-cost rather than service-level criteria are more likely to influence the management of non-critical data. For example, in the case of archival data where retention is mandated by SEC rules or other areas of legislation, service levels that allow for real-time access or synchronous offsite backup will drive up costs without enhancing the competitiveness or performance of the firm. Therefore, in the case of non-critical data, an IT manager may be able to compromise on service levels (up to a point) in pursuit of cost savings. This could entail migrating data to cheaper storage media (for example, tape rather than disk despite the higher risk of media failure and limited data access) or postponing a purchasing decision until the cost savings from new storage infrastructure exceed the operating costs of the existing storage infrastructure.
Therefore, information management costs are not only impacted by the growing volumes of new and existing data, but by the value to the firm of the information distilled from these data. As information value rises, it will become increasingly difficult to contain information management costs since a sizeable proportion of that cost constitutes a quasi insurance premium against data loss or corruption. The fact that next generation storage infrastructure is better, faster, and cheaper does not automatically cost-justify the replacement of existing infrastructure unless an improvement in infrastructure reliability and quality helps to expand the firm's service level offerings. As information value falls, there is less need to maintain high service levels that would only increase information management costs and so there may be opportunities to migrate or transfer data to cheaper media or to otherwise lower costs through reduced service levels. In this way, information management costs reflect different service levels that in turn reflect differences in the value of the underlying information.12
Once we know the value of information by, for example, determining how much additional profit or market share a firm could gain from using the information or how much could be lost in legal penalties or foregone profits if information is inaccessible,7 we can define an appropriate service level. However, as reported in the information life cycle management (ILM) literature, valuing information is a non-trivial exercise since information value can change over time.4 For example, an airline passenger manifest is highly valuable when an aircraft is in flight but that value declines to zero (or close to zero) at the precise moment that the aircraft lands safely. Meanwhile, in the pharmaceutical industry, data from clinical trials can increase in value as a drug moves through successive stages of FDA approval. The cost to the firm of data loss increases exponentially as the expected market value of the drug climbs as each stage in the drug approvals process is successfully traversed, even though the data itself remains unchanged.
ILM, therefore, adds a further layer of complexity to the management of information costs in that non-critical or low-value information at a point in time could become critical or high-value at a later time, or vice versa. Accordingly, firms need to evaluate information value on an ongoing basis, not just for new data flowing into the firm but for data already held in data marts and archives or scattered on hard drives, thumb drives, CDs, and other media. As the value of information is assessed, there is a need to identify if the storage infrastructure is providing a sufficient level of service, minimizing costs but without elevating the risk of data loss or corruption to extreme levels.12 In some instances, the extent of storage spending could be reduced if risk is negligible. Not all data are equally valuable and so it is economically unwise to treat all data with the same degree of care and attention. This does not mean, however, that low-value data should be deleted for as legal e-discovery events have shown, low-value data could still increase in value.
One way to limit this complexity, and so begin to understand the dynamic nature of information management costs, is to consider a tiered information framework that reflects an ordering of information value throughout the firm. New data can flow directly into a tier or existing data can migrate to other tiers to reflect a change in its underlying information value. As shown in Figure 2, we illustrate this approach using a three-tier structure, where tier 1 contains high-value or critical data, tier 2 contains medium-value data, and tier 3 contains low-value or non-critical data. In reality, while the number of tiers and the ranges of information values that separate each tier are uniquely defined by each firm's business requirements,5 archiving practices mean that the demand for storage capacity will be greatest in lower tiers than in higher tiers. For example, under Sarbanes-Oxley and SEC rules, publicly traded firms are now required to retain electronic records of financial transactions for up to seven years.11 For many firms, this entails a significant archival undertaking and a substantial amount of investment in storage capacity.
Behind each tier lies a collection of hardware, software, and networking infrastructure that varies in complexity and reliability. In tier 1, for example, high-value data that are accessed on a frequent basis by ERP or CRM applications could be held on high-end storage servers such as EMC Symmetrix that are connected to application servers and other networks via fiber optic links. Firms cannot afford to allow tier 1 data to be offline for any period of time and so it is advisable that tier 1 data be replicated in real-time to backup servers at a hot-site facility and that tier 1 infrastructure be able to seamlessly provision additional storage capacity. Tier 2 contains less valuable data such as payroll records or sales commission schedules - data that while important to the business is unlikely to be accessed on a frequent basis. This data could be held on midrange systems such as EMC CLARiiON or Sun StorEdge. Lastly, Tier 3 holds non-critical data on low-end devices such as Dell PowerVault on in backup tape libraries. The low value of this data mean that service levels can be lowered subject to the fact that these data may be needed for regulatory or auditing purposes in future. Since infrastructure is only visible upon breakdown, aggressive cost cutting - even within tier 3 - could create an unnecessary level of risk for the firm.
Overall, the primary difference between each tier's storage infrastructure relates to their service-level abilities. For example, tier 1 infrastructure supports applications that routinely require real-time data access and so connectivity, scalability, speed, redundancy, and fault-tolerance are desirable and necessary qualities. As we look to lower tiers in the framework, the need for superior service levels gradually abates since the underlying information value is much lower, at which point cost becomes the dominant concern.
Once this tiered information framework is in place, it becomes easier to identify opportunities for managing information costs. These opportunities could lead to a reduction in costs, but since the intent of the framework is to help IT managers to identify an appropriate level of cost, it could also signal a need to increase costs to safeguard high-value data. However, what is certain is that future innovation will help to reduce information management costs and so whether managers are coping with the influx of new data or reacting to changes in the value of existing data, they must still determine how to incorporate better, faster and cheaper storage infrastructure into their organization. As seen in Figure 3, our framework can help to clarify this issue by showing how new infrastructure fits into the various storage tiers in either a top-down or bottom-up manner, reflecting the relative desirability of improved service levels or cost savings.
As mentioned earlier, the management of the top tiers of the framework is motivated by the need for superior service levels, whereas cost control is more important at lower tiers. As data volumes rise and IT managers are forced to identify what tiers are most deserving of new storage investment, their decision will hinge upon whether there is a need for enhanced service levels within the upper tiers or whether there is a greater need for cost savings within lower tiers. If there is a greater need for improved service levels - an issue that is more likely to apply to data of a strategic or competitive nature in tier 1 - faster and more reliability storage infrastructure will be applied first to tier 1, and then sequentially to lower tiers. If there is a greater need for cost reduction - something that is more likely to apply to lower tiers given the larger volumes of archival data and the use of older and potentially less reliability storage infrastructure in these lower tiers - new storage investment will be first applied to tier 3 and then sequentially to higher tiers.
This top-down and bottom-up sequencing of new investment only makes sense in the context of a tiered approach to information management. For firms who have yet to structure their data in this manner - in effect, storing data on similar devices regardless of information value - the prudent approach would be to upgrade their storage infrastructure as often as possible in order to protect their highest-value data. However, it is likely that firms in this situation will not have optimized their storage infrastructure and so they will either over-invest in infrastructure and over-insure against threats to information value or under-invest and potentially expose their firm to catastrophic financial losses from data loss and corruption.12
While our tiered storage framework can improve data management by unraveling the relationship between information value, service levels, storage infrastructure, and information management costs, there are a number of issues that have yet to be resolved and which require further investigation. For example, if information value changes inside a short timeframe, as in our example of an airline passenger manifest, the frequent migration of data between tiers (specifically, from tier 1 to tier 3 in this example) could result in a significant transaction cost and potentially disrupt the dynamics of how information is managed. Any attempts to generate cost savings by moving data to a lower-cost storage tier may need to consider not just the potential per-gigabyte cost savings but the added risk of storing the data on a less reliable architecture.
Equally relevant is the question of how information should be valued and how often such value is reassessed. Information value is significantly determined by how information is used and could depend on access to complementary resources and skills.10 For example, a firm could purchase point-of-sales data from a third party and yet not have the internal expertise to take advantage of it. Information value is also driven by compliance and the need to avoid legal fees and penalties that could follow from an inability to reproduce data in a timely manner for an SEC or FDA audit. In light of recent accounting scandals, firms do not want to tarnish their reputation by taking unnecessary risks with data. Research can help to identify best practices around how firms can factor such complex elements into a storage investment decision.
What is clear from this discussion is that storage and information management more broadly, are no longer mere tactical issues that can be addressed at an operational level. If information is critical to the success of the firm, senior management must help to design a strategy and a set of policies detailing how information should be managed. Failure to acknowledge differences in information value through a tiered information framework, such as outlined in this article, could needlessly expose firms to significant risks.
A rapid decline in per-gigabyte storage costs has fostered a perception among IT practitioners that innovation alone can contain information management costs. The fact that non-discretionary spending now consumes 63% of IT budgets and that storage costs are somewhat to blame,2 suggests that firms cannot ignore how data are managed. If firms continue to collect and retain data at present rates amidst demands to contain rising storage expenditure, we can expect data management to become more complex. Growing amounts of unstructured data (email, video, images) will add to that complexity. Treating all data equally, as many firms have done in the past, is no longer suitable when information value varies widely and when the service level output of firms' storage infrastructure also varies. Using a tiered information framework is key to understanding the dynamics of information management costs, not only as a way to control costs but to highlight areas where additional investment is needed to protect information value and to ultimately secure the survival of the firm.
3. Computer Sciences Corporation. Critical Issues of Information Systems Management. Cambridge, MA, 2001. http://www.csc.com/features/2001/40.shtml
4. EMC Corporation. Information Lifecycle Management, 2004. http://www.emc.com/ilm/.
8. Gilheany, S. The decline of magnetic disk storage cost over the next 25 years. Berghell Associates, 2004. http://www.berghell.com/whitepapers/Storage%20Costs.pdf.
9. Lyman, P. and Varian, H. How Much Information? UC, Berkley, School of Information Management and Systems, 2003. http://www.sims.berkeley.edu/research/projects/how-much-info-2003/.
11. Securities & Exchange Commission (SEC). Final rule: Retention of records relevant to audits and reviews. http://www.sec.gov/rules/final/33-8180.htm.
©2010 ACM 0001-0782/10/0500 $10.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.