Computing Applications BLOG@CACM

Researchers’ Big Data Crisis; Understanding Design and Functionality

The Communications Web site,, features more than a dozen bloggers in the BLOG@CACM community. In each issue of Communications, we'll publish selected posts or excerpts.

Follow us on Twitter at

Michael Stonebraker issues a call to arms about research groups' data-management problems. Jason Hong discusses the nature of functionality with respect to design.
  1. Michael Stonebraker "Big Data, Big Problems"
  2. Jason Hong "Design, Functionality, and Diffusion of Innovations"
  3. Reader's comment
  4. Authors
  5. Footnotes
January 14, 2011

I was at a conference recently and talked with a science professor at another university. He made the following startling statement.

He has close to 1 petabyte (PB) of data that he uses in his research. In addition, he surveyed other scientific research groups at his university and found 19 other groups, each with more than 100 terabytes (TBs) of data. In other words, 20 research groups at his university have datasets between 100TB–1PB in size.

I immediately said, "Why not ask your university’s IT services to stand up a 20-petabyte cluster?"

His reply: "Nobody thinks they are ready to do this. This is research computing, very different from regular IT. The trade-offs for research computing are quite different from corporate IT."

I then asked, "Why not put your data up on EC2?" (EC2 is Amazon’s Elastic Compute Cloud service.)

His answer: "EC2 storage is too expensive for my research budget; you essentially have to buy your storage every month. Besides, how would I move 1PB to Amazon? Sneaker net [disks sent to Amazon via U.S. mail] is not very appealing."

As a result, he is in the process of starting a 20-research group federation that will stand up the required server. In other words, this consortium will run its own massive data server.

I am reminded of a talk given a couple of years ago by James Hamilton, then at Amazon. He claimed there are unbelievable economies of scale in running grid-oriented data centers (that is, if you run 100,000 nodes, your costs are a small fraction of the costs of running a 1,000-node data center). Many of these cost savings come from unexpected places. For example, designing a physical data center (raised flooring or uninterrupted power supply) is something the small guy does once and the big guy has down to a science. Also, personnel costs rise much more slowly than the number of nodes.

I assume at least 20 universities have the same characteristics as the one noted above. Also, my assumption is these 20 × 20 = 400 research groups that get their funding from a small number of government agencies. It would make unbelievably good sense to have a single 400PB system that all of the researchers share.

In effect, this blog post is a "call to arms." Agencies of the U.S. government are spending boatloads of money on pushing the envelope of massive computer servers. However, they appear to be ignoring the fact that many research groups have serious data-management problems.

Why not invest a small fraction of the "massive computing" budget on "massive data management"? Start by standing up a 400PB data server run by somebody who understands big data. Several organizations with the required expertise come readily to mind. This would be a much better solution than a whole bunch of smaller systems run by consortiums of individual science groups.

There must be a better way. After all, the problem is only going to get worse.

Back to Top

Jason Hong "Design, Functionality, and Diffusion of Innovations"
February 6, 2011

A few months ago, I wrote two CACM blog entries examining why great design is so hard. There have been a lot of great comments and insights from a bunch of people.

There’s one comment in particular I’d like to respond to because it poses a good question about the nature of functionality with respect to design. In the comment, Mark Tuttle argues that the functions offered by a system outweigh design, citing examples of time-sharing systems, email, Unix, bitmapped interfaces, MEDLINE, and query languages for early relational databases.

To a large extent, I do agree on the importance of functionality. If we had a system that could predict tomorrow’s stock market prices but was completely unusable, I’m sure we’d still see a lot of people making the effort to learn how to use it.

However, functionality and design aren’t separate things. A large part of design includes understanding what needs people have and what technologies can be applied to solve those needs. Design isn’t just about the user interface "skin" of graphics, icons, and aesthetics that people see. It includes the internal "skeleton" of how the application is organized, the conceptual model and metaphors conveyed to end users, as well as its functionality.

It’s also worth pointing out that in many of these examples about functionality, the systems were designed by and used by people with intimate knowledge of software. The designers already had a deep understanding of what the problems at hand were and how people did their work. In other words, the designers were the users. That’s not really the case anymore, though. Information and communication technologies are pervasive in all aspects of modern society, from finance to manufacturing, from health care to consumer products. We can’t rely solely on our intuitions anymore because the users are no longer like us.

Second, while functionality is a key differentiator for technologies in the very early stages of adoption, it isn’t as strong a draw in the middle and late stages, especially when competitors have arrived that offer products with comparable functionality.

JASON HONG: "Interaction design is about always remembering the mantra ‘The user is not like me.’ "

A lot of researchers have developed models of technology adoption. My favorite one is presented by Everett Rogers in the well-known book Diffusion of Innovations, that summarizes the literature in that area. Rogers outlines five major factors that influence whether or not people adopt a given innovation. Note that in Rogers’ book, an innovation can be not only a technology, but also a process or a habit. Here, I will focus only on technologies.

The five factors Rogers identifies are: relative advantage, compatibility (with one’s beliefs and existing installed base), complexity, trialability (how easy it is to try the technology), and observability (how easy it is to see others benefit from the technology). Rogers also presents the well-known technology adoption curve that segments the population based on when they adopt an innovation, labeling people as innovators, early adopters, early majority, late majority, and laggards.

At this point, you can probably see where I’m going. Innovators and early adopters have a higher tolerance for risk and complexity, and often have significant pains that need to be addressed now. In these cases, functionality dominates factors like aesthetics, usability, simplicity, and cost. If you have ever seen (or, worse, had to use) anything created for the Department of Defense, you will probably agree it fits this description quite well.

However, the early majority, late majority, and laggards have very different profiles and thresholds for complexity and value. In these cases, the ability to create a product that fits into people’s lives plays a significant role, especially when competing products are available. This is where great design matters a lot, as demonstrated by products like the Nintendo Wii and the iPod. When the Wii first came to market, it was competing against the large and well-established base of Xbox and PlayStation consoles, and succeeded by dramatically simplifying game play and targeting casual gamers rather than hard-core gamers. The iPod came out a few years after the first MP3 players were already being sold, and made huge inroads by forging a strong emotional connection to people with its sleek form factor and fun user interface.

At its core, interaction design is about understanding at a deep and visceral level the needs, desires, values, and processes of people, and then applying those insights in the creation of technology. It’s about empathy, seeing and experiencing the world from the user’s point of view. And it’s about always remembering the mantra "The user is not like me."

Back to Top

Reader’s comment

I totally agree with you. It is very important to address the issues of early adopters of the product. Also, like you pointed out regarding the Nintendo Wii, the emphasis on great design and innovation in a product/software is always dependent on how established the competitor’s product/software is. Whereas, for established products, like Microsoft Office, change and innovation in the product’s design has greater risk as users tend to get confused, like with the ribbon interface in Microsoft Office. UX, on the whole, is a major factor in early stages of adoption, but emphasis on it gradually fades away as a product becomes more popular (and users don’t like a change).

Developers and designers are in peril when they ignore the functionality and UX after a product becomes very much established as they run into the risk of competitors rolling out a product with same functionality but better UX.

Rahul Kavi

Back to Top

Back to Top

    Disclosure: Michael Stonebraker is associated with four startups that are either producers or consumers of database technology. Hence, his opinions should be considered in this light.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More