Opinion
Computing Applications Viewpoint

Soft Infrastructure Challenges to Scientific Knowledge Discovery

Seeking to overcome nontechnical challenges to the scientific enterprise.
Posted
Soft Infrastructure Challenges to Scientific Knowledge Discovery, illustration
  1. Article
  2. Authors
  3. Footnotes
  4. Sidebar: The National Research Council's Board on Research Data and Information Workshop
Soft Infrastructure Challenges to Scientific Knowledge Discovery, illustration

Open network environments have become essential in the sciences, enabling accelerated discovery and communication of knowledge. The genomics revolution, for example, involved gene-sequencing machines that accelerated genome mapping. Yet, the real revolution began when open community databases allowed researchers to build on existing contributions and compare their results to established knowledge. Such payoffs have spread to other fields through exploitation of open geographic information systems and open social network data. In another example, "citizen science" engages people not previously involved in science in data collection and analysis via technology-supported, distributed collaboration. Transforming scientific knowledge discovery requires collaboration among specialists in domain sciences (for example, chemistry, geology, physics, molecular biology), the computing sciences, and the human sciences. The very nature of scientific knowledge discovery is changing.

The computing research community should be interested in these developments. Computing research is affected by these changes, and computing in general plays a special role in the technologies that make such progress possible. The National Research Council’s (NRC) Board on Research Data and Information (BRDI) has been working on this topic (see the accompanying sidebar). New techniques and methods can help achieve new benefits at reduced time and cost. However, there are also significant barriers that are nontechnical in nature, which this Viewpoint highlights.

Digital computing technologies (data mining, information retrieval and extraction, artificial intelligence, distributed grid computing, and the like) are important, but they are only part of the cast in the larger play of computer-mediated knowledge discovery. Online availability of data and published papers affect the research life cycle through rapid dissemination of research results and broader participation in research activity.a Exploiting such opportunities requires moving beyond technical challenges, which are frequently difficult to overcome in their own right, to the challenges of "soft infrastructure": institutional factors, governance, and cultural inertia that tend to impede payoffs from the rapid evolution of techniques and methods. The promise is great, but it will be realized only if the complementary assets of production, including those of soft infrastructure, are provided.

Soft infrastructure ranges from the psychology of individual action to institutional reward structures and intellectual property conventions. Technology enables improvements and stimulates new thinking, but fitting new technology to existing practice can violate social protocols that have been refined and embedded over centuries. These include the attitudes and practices of researchers, publishers, reviewers, and university promotion committees. These are part of the culture of knowledge discovery. They predate technology-enabled, open networked environments, and many of them persist for sensible reasons. Dismissing them as "resistance to change" is dysfunctional.

In fact, the culture of knowledge discovery is open to change. For example, most university promotion committees now accept that some areas of computing research consider refereed conferences more important than refereed journals. This change took effort: it required persuasion and the authoritative efforts of the ACM, the Computing Research Association (CRA), and the National Research Council’s Computer Science and Telecommunications Board (CSTB). The change was the "right thing to do," but it took much work and a number of years.

The concerns underlying soft infrastructure are important but often overlooked. Few scientists want to change a reward structure based on results (for example, contributing to scientific knowledge), even though that structure persuades smart researchers against sharing anything before rewards are worked out. Similarly, publishers, universities, and others are reluctant to give up intellectual property rights, and useful research efforts have been derailed by inability to come to agreement on such matters. This is not blind "resistance to change." It is smart, at least in the short run. But is it smart in the long run? If not, how can the culture of knowledge discovery be changed to address this? This challenge is exacerbated by the increasingly global nature of research. Different institutional approaches, languages, norms, and levels of development create challenges that will take time to sort out.

Innovative ways can be found to leverage open networked environments. Scientific knowledge discovery can be improved through support of science-funding agencies, research universities, and science and engineering professionals. To be effective, however, these efforts require attention to what economists call complementary assets: the full set of elements required to gain hoped-for benefits. An example is data curation, preparing and maintaining data with potential for reuse. This involves deciding which data will be kept, how the data are described (for example, through metadata), how quality control will be maintained, and how coding schemes and analytical tools that enable reuse will be provided. Whose job is data curation? Researchers frequently lack the skills and inclination to take on this work, and assume others such as academic librarians will do so. Yet where academic library resources are strained, librarians cannot take on extra work.

Serious power issues can arise from such challenges. Researchers that object to anyone but themselves controlling aspects of scientific work they see as essential might refuse to take on additional work they see as unessential. To continue the example, researchers and academic librarians have cooperated because of conventions created over decades and grounded in a different era. If researchers insist that academic librarians take on additional work the librarians cannot afford, a power conflict can emerge. It is difficult to change an equilibrium that has worked well in order to achieve new benefits of scientific knowledge in open networked environments. Citizen science raises similar concerns. Citizen science is growing: the Cornell Lab for Ornithology’s eBird project and Galaxy Zoo in astronomy and are but two examples, each involving tens of thousands if not hundreds of thousands of people who have never been socialized into research work. Such people may make unconventional demands if they feel they are not properly compensated for their important efforts. Such power conflicts do not arise from open networked environments, per se, but from the new opportunities enabled by such environments in circumstances of constrained resources.

It is common when confronting challenges involving rewards, power, and conflict to enlist the economic, social, and behavioral sciences—what some refer to as the "human" sciences. This effort is similar to that involving the computing sciences over the past three decades, and lessons from that experience are germane to the current situation. "Enlistment" is a telling notion: the human sciences are no more willing to be ordered to do such work than computing sciences were before them. Neither wishes to offer poorly formulated solutions to poorly understood problems. Neither sees its community as "hired guns" whose only purpose is to fix the problems faced by other scientists. Both are interested in making progress in their respective fields, and only when that progress is addressed are they willing to discuss work that also benefits other sciences. Edicts requiring interdisciplinary work involving people from various sciences seldom persuade the best scientists from various fields to collaborate. The human sciences are needed for the promise of scientific knowledge discovery in open networked environments, just as the computing sciences have been. In both cases, the art of effective "deal making" is still evolving.

A final complicating factor is the changing value proposition of research. During much of the 20th century the U.S. research enterprise capitalized on the benefits of scientific agriculture, advanced the industrial revolution, improved human health, and helped achieve victory in conflicts such as World War II and the Cold War. Knowledge discovery was a public good, and more was better. Now, politicians and policymakers acknowledge the value of scientific knowledge discovery, but at the same time ask how much is needed, at what price, paid for by whom, and benefiting whom? Scientific knowledge discovery has become important. Important things become political. The political salience of science is unlikely to translate into a blank check for scientists to spend as they choose. If anything, science is increasingly subject to calls for cost-benefit analyses in a pluralistic political environment where a shared value proposition (needed for coherent cost-benefit outcomes) is increasingly difficult to establish. Influential people who value open scientific knowledge discovery also invoke practical considerations such as economic growth, national security, health improvements, and other societal goals. Open networked environments can enhance scientific knowledge discovery, but the political tensions regarding science are likely to grow.


Scientific knowledge discovery has become important. Important things become political.


In short, open knowledge discovery can improve the scientific enterprise. One can think of this as a mandate created by technological progress, particularly in the digital realm. Nevertheless, the success of research, broadly considered, depends on the appropriate management of the underlying soft infrastructure.

Back to Top

Back to Top

Back to Top

    a. An Internet search for "research life cycle" images (including the quotes) yields more than 170,000 links as of July 2014. Research is a set of activities that take place over time, sometimes presided over by different people. Open networked science can change many aspects of the research life cycle.

    The authors thank J Strother Moore and two anonymous reviewers for their help with this Viewpoint. The views expressed here are those of the authors and should not necessarily be ascribed to their employers.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More