BLOG@CACM
Artificial Intelligence and Machine Learning

Are Data Miners Ready to Hang ­p the Hard Hat and Put on a Lab Coat?

Posted

This week, Chicago hosts the 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. More than 1,200 researchers and practitioners across academia and industry are converging to discuss, debate, and advance the field of knowledge discovery from data. A blend of theoretical and algorithmic advances alongside very novel applications and in-the-trenches experiences on applying data mining to solve real world problems showcase the diversity of this vibrant community.

 

Quality Research and Rich Program

With a paper acceptance rate of just 17% in the research track KDD is one of the most competitive conferences within the computer science and statistics community. What is even more remarkable about KDD is the noticeable integration of an Industry Government (IG) track with the Research track. The IG track papers present the best-in-show applications, many of them deployed to solve real-world problems using existing machine learning techniques. These papers also contribute to the development of new paradigms that are widely applicable across datasets and domains. Here are this year’s best paper award winners:

  • Best Student Paper: “A Space Efficient Streaming Algorithm for Triangle Counting Using the Birthday Paradox” by Madhav Jha, C. Seshadhri, and Ali Pinar;

  • Best Research Track Paper: “Simple and Deterministic Matrix Sketching” by Edo Liberty;

  • Best Industry Government Track Paper: “Amplifying the Voice of Youth in Africa via Text Analytics” by Prem Melville, Vijil Chenthamarakshan, Richard Lawrence, IBM Research; James Powell, Moses Mugisha, Sharad Sapra, UNICEF Uganda; Rajesh Anandan, US Fund for UNICEF; Solomon Asseefa, IBM Research.

Check the KDD 2013 topic areas:

KDD 2013 Topic Areas

Real-World Problems, Applications and Solutions

The now maturing Industry Practice Expo (IPE) track is making KDD even more of a must-attend event for practitioners of big data since it provides a forum for a select group of senior and experienced data mining experts to present their lessons on deploying data mining and machine learning solutions in various industry settings. This year’s IPE track features eight world-renowned experts and hosts its first panel session on a provocative subject: “Death of the expert? The rise of algorithms and decline of domain experts.”

The other outstanding feature of KDD is the prestige associated with its annual premier data mining contest, KDD Cup. You probably remember the famous Netflix prize , which pioneered as a KDD Cup. The original idea of hosting a contest within the conference community setting to motivate research was born at SIGKDD and has been adopted by many other SIGs and other computer science forums. This year the KDD Cup 2013 contest featured a heterogeneous, complex and noisy real world dataset provided by Microsoft Academic Search. Over 800 teams duked it out to design accurate solutions for two challenges: (a) disambiguate duplicate author names, and (b) improve author-paper assignment. Over 10,000 data mining models were developed and tested in a short span of just two months by students, faculty and industry developers! The challenge, and winning solutions/models are open source and widely available on the KDD Cup 2013 website. This year’s winner for both challenges was Team Algorithm from National Taiwan University.

A New Generation of Data Scientists

Although there are a lot of parallels between past conferences and KDD 2013, I am noticing something unique at this year’s event.  An attendee, who would typically park themselves in one session for two hours in prior years, is now moving around to attend talks across sessions! At first, I thought that this might be a sporadic event when people flock to a presentation by one of our ‘star’ speakers. After careful observation, I noticed a continuous stream of attendees actually selecting variety of talks across the research, industry, and industry practice expo tracks. This is great news for SIGKDD in particular, and for computer science in general. As theoretical advances share the podium with applied researchers, a new generation of data scientists is set to emerge, where hard hats and lab coats blend together into a sea of potential scientific data-driven advances that benefit every constituent of our community not just equally, but also equitably. Perhaps the best example of this blend between science and engineering is selection of the SIGKDD doctoral dissertation awards, in which the winner focused on heterogenous information networks, while the runner up delved into applications of machine learning principles to health informatics:

  • SIGKDD Doctoral Dissertation Award Winner: Yizhou Sun, “Mining Heterogeneous Information Networks,” Advisor: Jiawei Han, University of Illinois at Urbana-Champaign
  • SIGKDD Doctoral Dissertation Award Runner Up: Byron Wallace, “Machine Learning in Health Informatics: Making Better use of Domain Experts,” Advisor: Carla Brodley, Tufts University

    Normal 0 false false false EN-US JA X-NONE
Normal 0 false false false EN-US JA X-NONE

Data Science for Social Good is indeed the first step in that direction, but more on that later. Stay tuned from Chicago!

The full program for KDD 2013 is available here. You can follow along on Twitter by following @kdd_news or #kdd2013.

 —Ankur Teredesai is Publicity co-chair for KDD 2013

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More