Computing Applications News

Data Mining Meets City Hall

Local and national governments are turning to open data to cut their costs, increase transparency and efficiency, and respond to the needs of citizens.
  1. Introduction
  2. Data Transparency
  3. Author
  4. Figures
NYC Chief Digital Officer Rachel Sterne
New York City's chief digital officer Rachel Sterne during the first day of Internet Week NY, in June 2011.

By now, the refrain is familiar to many New Yorkers: “In God we trust. Everyone else, bring data.” It is a phrase—and a mind-set—the city’s mayor, Michael Bloomberg, tirelessly promotes. Over the course of his 10-year term, Bloomberg has transformed a domain where statistical analyses are confined to opinion polls into one where data-driven transparency is hailed as a means of cutting costs, increasing government efficiency, and engaging often-jaded citizens.

The Bloomberg administration has tracked its performance through indicators like infant mortality rates, which have fallen by nearly 20% since Bloomberg took office in 2001, and responses to reported emergencies, which took a minute longer in 2011 than in 2001. It has also used predictive analytics to investigate potential fire risks. Locations flagged as high risk are investigated within 48 hours, and six months after the program’s implementation, fire department personnel and buildings inspectors found seriously hazardous living conditions in more than 75% of their inspections, an increase of 400% over the previous two years. And last year the Bloomberg administration created NYC Digital, an office whose goal is to “realize New York City’s potential as the world’s leading digital city.” Led by the city’s first chief digital officer, Rachel Sterne, NYC Digital has organized hackathons, app competitions, and other initiatives to encourage the use of municipal data.

Open data “democratizes the exchange of information and services,” says Rachel Sterne. “It empowers citizens to collaboratively create solutions.”

“It democratizes the exchange of information and services,” Sterne explained at the 2011 O’Reilly Strata Conference in New York. “It empowers citizens to collaboratively create solutions. It’s not just the consumption but the coproduction of government services and democracy.”

Increasingly, New York is not alone in its commitment to open data and transparency. North American cities like Chicago, Portland, San Francisco, and Vancouver have begun to release their datasets. The U.S. government has also done so, as have the governments of Australia, Estonia, New Zealand, Norway, and the United Kingdom, along with global organizations like the United Nations and the World Bank. The catalysts for their change are varied, but recession-squeezed budgets often play a significant role.

“It wasn’t really an innovation-centric approach at first,” admits Dustin Haisler, the former Chief Information Officer for the City of Manor, TX, who won plaudits for his use of QR codes to distribute information on city projects and historic buildings. “We were boxed into a corner financially, and it grew out of a search for alternative ways to solve problems.”

To that end, many participants see open data as not just a way for governments to track their own progress, but as a way to enable citizens to work together to improve it.

Much of the excitement revolves around municipal initiatives. “Cities are where most people understand their government and interact with it,” says Jen Pahlka, the founder and executive director of Code for America, a nonprofit organization that connects developers with municipal governments through a year-long fellowship program, a “brigade” of volunteer professionals, and a soon-to-be-launched start-up incubator. Thus far, Code for America has leveraged public data to help citizens to keep up with legislation in Philadelphia, browse public school information in Boston, and locate retailers that accept food stamps in every city in the U.S.

Back to Top

Data Transparency

Organizations like Code for America are part of an increasingly vibrant ecosystem that works to promote data transparency. Typically, data advocates balance basic work to open new data-sets and encourage the adoption of global standards with purpose-driven projects like creating apps and application programming interfaces (APIs). The Washington, D.C.-based Sunlight Foundation, for example, employs a team of investigative journalists to obtain data, such as information about political junkets and contributions, through Freedom of Information Act requests. It also houses a policy team that engages in direct lobbying, a team that organizes volunteers, and a lab whose 20-odd developers focus on making data more accessible.

Sunlight’s first API, launched in 2007, leveraged work by Joshua Tauberer—a software developer who, in 2004, launched to aggregate federal legislative data—to create a site that helps people find and contact their congressional representatives. “It’s now evolved into something that provides better results than official government sites since we offer geographical lookups based on latitude and longitude,” asserts Tom Lee, director of Sunlight Labs. More recently, Sunlight has released APIs that track the development of legislation and can detect, for example, when legislative language moves between different statehouses or from interest groups into law.

At the federal level, activity has focused on aggregating the data published by various agencies and trying to make it useful to developers and the public. The Obama administration has been a vocal supporter of open data, and in 2009, Vivek Kundra, then federal chief information officer, launched, a site whose goal is “to improve access to federal data and expand creative use of those data beyond the walls of government.” At the time of its launch, had 47 datasets; at press time, the site boasted nearly 400,000 datasets.

In recent years, the team has facilitated the organization of issues-based communities such as people with an interest in energy or health. “Then, we can put out a call to the community and find out what they need, and go to the relevant agency and try to get it for them,” explains Jeanne Holm, who joined as a data evangelist in 2010. “Each community can spark the publication of hundreds of thousands of datasets that wouldn’t have been published otherwise, and bring people together around those conversations.”

Communities can also sound the alert when datasets do not line up. Sunlight wrote scripts to reconcile the data at with the Catalog of Federal Domestic Assistance and discovered a discrepancy of $1.3 trillion. It then contacted agencies, like Health & Human Services, whose data was inaccurate. “We brought them all this negative attention, but by and large, people in government are trying to do a ton with very limited resources, and what they need are better tools,” says Lee.

Sunlight Foundation APIs can track the development of legislation and detect when legislative language moves between different statehouses or from interest groups into law.

The British government’s counterpart to was launched in January 2010 at Built under the direction of Nigel Shadbolt, a professor of artificial intelligence at the University of Southampton, and Sir Tim Berners-Lee, now offers more than 5,400 datasets, including detailed crime statistics, government spending data, and health indicators like hospital infection rates.

To date, dozens of apps have been created that enable citizens to do everything from researching and comparing nursing-care facilities to viewing scheduled road closures across the country. Shadbolt and Berners-Lee, director of the World Wide Web Consortium, are now leveraging their work on the site to create a new government research center, the Open Data Institute. “The mission is to extract not just social value or efficiency for the public sector, but actual economic value in terms of businesses that might develop and promote and use data,” says Shadbolt, who is crafting the Institute’s implementation plan.

Indeed, getting companies and citizens actively involved is one of the biggest promises of open data, according to advocates. The discussion is often framed in the larger context of what is known as Gov 2.0—going beyond data publication to crowdsourced information solutions. “There’s an interesting question about whether we can build enough capability and infrastructure to allow people to be creative around these assets,” says Shadbolt. SeeClickFix, a Web site and mobile app that enables U.S. citizens to report non-emergency neighborhood issues like potholes and streetlight outages, is a popular example. Borne of Web designer Ben Berkowitz’s frustration with the graffiti on his house in New Haven, CT, it has since grown into an 11-person start-up that works with city governments across the country to manage quality-of-life service requests and get usage analytics.

“We can turn on the hydrants of data, and that’s great for developers and great for a lot of companies, but for the average person on the street, it impacts them when they see an application that is of utility to them,” says Shadbolt, who cites the example of a transit app that tells commuters when the next bus will arrive. “So I have information that has delivered an immediate decision-making benefit, and that’s what people understand. It’s data that’s turned into information that, in turn, becomes knowledge because you’re using it to take an action. It’s a classic computer science view.”

Outreach remains a challenge, however, and critics often charge that Gov 2.0 empowers the already empowered, such as the tech-savvy people who do not need to be convinced that open data can improve government. Yet data advocates try to engage under-served populations.

“There’s sometimes the perception that if you do a mobile app, you’re hitting the wealthier members of the community,” says Code for America’s Pahlka. “But mobile is an incredibly important strategy if you’re looking at low-income communities.”

Pahlka cites a study by New York City’s Department of Social Services, which found that more than 80% of the people who visited its facilities were regular cellphone users, and that 35% of them owned smartphones. SMS-based apps are another way to broaden accessibility and adoption as are targeted outreach campaigns. “It’s not about a broad advertising campaign for users that are already in the know,” says Pahlka. “It’s about partnering with cities to reach the people who need these services. If you’re targeting users of social services, advertise to them in the department during the transactions.”

In spite of the progress that has been made, changing government culture also remains a challenge. Due to procurement processes and well-established power structures, the efficient release and management of data can be difficult. “The instinct to protect data generally comes from a place of public service,” asserts Pahlka. “What we try to do is illustrate that things have changed, and that the benefits outweigh the risks.” now offers more than 5,400 datasets, including detailed crime statistics, government spending data, and health indicators like hospital infection rates.

In Boston, Code for America’s fellows tried in vain to persuade the city to release data collected by public school buses’ GPS devices. When winter hit and multiple snowstorms stranded many school buses, concerned parents overloaded schools with phone calls asking about their children. “And the fellows said, ‘It’s not hard for us to write an app that puts this data on the Web so parents can look it up themselves,'” says Pahlka. “Sometimes it’s not enough to explain why you want the data. You have to make it work for the people in the system.”

*  Further Reading

Howard, A.
Gov 2.0 goes local, O’Reilly Radar, Oct. 15, 2010.

Lohr, S.
The age of big data, The New York Times, Feb. 11, 2012.

Say, M.
Government plans open data push, The Guardian, nov. 28, 2011.

Sterne, R.
Data-driven innovation: How open government is transforming New York City, O’Reilly Strata Conference, New York, NY, Sept. 22–3, 2011,

Van Buskirk, E.
Sneak peek: Obama administration’s redesigned, Wired Epicenter, May 19, 2010.

Back to Top

Back to Top


UF1 Figure. New York City’s chief digital officer Rachel Sterne, right, at the opening press breakfast during the first day of Internet Week NY, which occurred in June 2011.

Back to top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More