At the same time the European Union is implementing new strict data protection regulations, China's data trading and sharing markets are booming. Here, we survey the status of these developing markets driven by growing demand from artificial intelligence (AI)-related industries, covering government encouragement as well as critical concerns and research opportunities including privacy and security.
China, with the world's largest e-commerce and mobile payment markets,a has an estimated big-data market of $70B circa 2015, which has been projected to grow to $155B by 2020.2 As in much of the world, over 80% of data in China is privately held by the governments and private companies, restricting its exploitation for productivity and profit. The President of China, Xi Jinping, particularly emphasized the importance of data's open sharing and fusion as part of the national strategy for big data on December 8, 2017, encouraging data sharing across government sections and local governments, and data sharing/trading between governments and private companies.
The founder of Fa Yuan Di Ltd. said the market size of data trading in China was approximately $3.2 billion in 2016. The market size is estimated to grow to $8.7B by 2020.1 Examples of data trading and sharing include:
Twenty data markets have been established by various local government authorities and private enterprises in China (see table at left), trading whole datasets, Web crawlers, APIs, and analytical results. They can be traded either off-the-rack or via customization, and come from many directions including banking, energy, health care, transportation, industry, agriculture, tourism, education, telecommunication, and much more. One of the largest is the Global Big Data Exchange in Guiyang with over 2,000 corporate members and more than 150PB of reported stored data circa March 2018.b
Following the U.S. and other countries, the Chinese policy encourages governments to share data to enhance transparency and efficiency. The State Council Guidelines for Promoting Big Data Development, released in 2015, proposed the goal of establishing a united platform for open government data by the end of 2018. The top priority is to share data from several important realms, including credit, transportation, and health care. The Chinese government also encourages private data trading to expand the digital ecosystem. There are no regulations specialized for data sharing and trading yet, except those aiming at protecting national security, trade secrets, and copyrights, and banning the propagation of illegal content (such as terrorism, fake news).2 Lack of regulation allows experimentation, but some data owners hesitate to share their data due to potential legal or business consequences. There have been public discussions about sharing and trading of personal data, which is openly traded. While the EU's General Data Protection Regulation (GDPR) governs companies collecting, processing, and selling their consumers' data, China has no national regulation on data protection; only fragmented regulations exist, like the Cyber Security Law and the Personal Data Infringement Interpretation that came into effect in June 2017.2
A GfK surveyc indicated 38% of people in China are very willing to share personal data for better service, whereas the ratio is 25% in the U.S. and lower than 20% in most European countries. However, awareness and concerns are growing in China. In March 2018, a survey conducted jointly by China Central Television and Tencent Research indicated 76.3% of 8,000 Chinese people interviewed were worried about the threat AI posed to their privacy. A few days later, Baidu's CEO said Chinese people are willing to sacrifice privacy in exchange for convenience, triggering an enormous public backlash.d Compared to the EU and the U.S., however, China's market is a lenient, more permissive environment for data trading and sharing.
Data markets are growing in China, but are still immature with most datasets small in scale, poor in quality, and low in value. Critical concerns inhibit data exchange, as depicted in the figure here.
Preprocessing. To increase usability, sellers must preprocess data cleaning, labeling, reconciliation, fusion, and desensitization, which require automation for big data. Manual approaches are still common. Behind China's booming AI industry are almost one million data labelers: mostly rural, part-time workers.e
Pricing. Data products have unlimited supply because of the little marginal cost, causing the Arrow-Debreu equilibrium price to be close to zero.5 While early attempts have explored data pricing,4,6 it is still an open problem. Today, standard whole datasets are sold at fixed rates, and customized data is priced by negotiation. The pricing strategies for APIs include pay-as-you-go and wholesale.
Security. Data is vital to the information asymmetry between different companies and government sections. Inappropriately sharing may reduce ability to compete, expose wrongdoings, or harm their public images. Most data is sold through API, but many sellers are worried the buyers can infer data content. Utilizing techniques such as query auditing can ensure data security and alleviate sellers' concerns. Data transactions should also be tracked to achieve accountability.3 Many platforms, such as Global Big Data Exchange (Guiyang) and JD Wan Xiang, have adopted blockchain to strengthen trading security because of its favorable decentralized and tamper-resistant qualities.
Privacy. Before being traded, data must be desensitized to protect personal information and privacy. Sellers should also take into account potential privacy leaks from temporal, spatial, and different owners' data linkage. Although China lacks strict regulations like the GDPR, most trading markets in China claim to desensitize personal data. For instance, Shanghai Data Exchange Corp. protects personally identifiable information through encryption and encoding when they perform data linkage.
Verifiability. There are cases where the traded data was forged, producing distrust. When sellers list their datasets on the marketplace, they must prove to the trading broker their ownership of the data, the data's authenticity and accessibility, and that the data content and quality are as claimed. In addition, the proof should not disclose the data content. When buyers purchase the API of a dataset, sellers must prove the API was correctly computed over this exact datasetextremely difficult if buyers also want query privacy.
While these challenges are daunting, the strong government encouragement and rich data collection enable rapid growth of large-scale data exchanges in China. With technology advances (blockchain, AI, big data analytics, cloud computing) and maturing policies (privacy, digital ethics), we are optimistic that better data sharing and trading ecosystems will help China's economy transition to a new level of global competitiveness.
©2018 ACM 0001-0782/18/11
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from email@example.com or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
No entries found