Third Workshop on Search and Social Media (SSM 2010)

It is my pleasure to report on the 3rd Annual Workshop on Search in Social Media (SSM 2010), a gathering of information retrieval and social media researchers and practitioners in an area that has captured the interest of computer scientists, social scientists, and even the broader public. The one-day workshop took place at the Polytechnic Institute of NYU in Brooklyn, NY, co-located with the ACM Conference on Web Search and Data Mining (WSDM 2010). The quality of the presenters, the overbooked registration, and the hundreds of live tweets with the #ssm2010 hashtag all attest to the success of this event.

The workshop opened with a warm welcome from Ian Soboroff (NIST), immediately followed by a keynote from Jan Pedersen, Chief Scientist of Bing Search. Jan established a clear business case for search in social media: the opportunity to deliver content that is fresh, local, and under-served by general web search. He drilled into particular types of content where social media search is most useful: expert opinions, breaking news, and tail content. The benefits of social media search include trust and personal interaction (as compared to web content that is often soulless and of uncertain provenance), low latency (though perhaps at the cost of accuracy), and access to niche or ephemeral information that web search rarely surfaces. But delivering social media results to searchers creates its own variety of challenges, such as weighing freshness against accuracy and relevance, coping with loss of social content's conversational context, managing low update latency when search engines have not been optimized for it, and fighting new kinds of spam. Despite these challenges, it is clear that the major web search engines have embraced the brave new world of real-time social content.

Eugene Agitchein (Emory University) then moderated a panel representing the world's leading search engines: Jeremy Hylton (Google), Matthew Hurst (Microsoft), Sihem Amer-Yahia (Yahoo!), and William Chang (Baidu). Jeremy justified the universal interface approach, pointing out that users don't want to have to figure out what kind of search site to use for their queries, and that they expect a familiar interface. He also noted that Google has made great strides on update latency: it can index the Twitter firehose in the same amount of time as serving a query. Matthew offered various analyses of the social search problem, based on whether the information signal resides in content (e.g., web) or attention (e.g., Twitter), or whether the information need is expressed in an explicit search query or inferred from the user's context. Sihem offered a counter-point to Jeremy, arguing that social media search queries often represent broad or vague information needs, and thus call for a more browsing-oriented interface than web search, which is optimized for highly specific needs. William noted that the biggest competitive threat he sees for web search engines comes from social media players–and he credits much of Baidu's success to its surfacing of social media content.

Then came a flurry of questions, perhaps the most interesting of which was how to address identity management. William argued that people prefer interacting with real-named (or pseudonymous) people to whom they are directly connected. Sihem offered the counter-example of obtaining recommendations through community aggregation. Matthew noted the incongruity of there being no economic relationship between social network companies that maintain proprietary social graphs and people whose identities and relationships those graph represent. Jeremy pointed out that users benefit if the data is as open as possible.

Given the almost even split between academic and industry participation in the workshop, the panelists were also asked to present research challenges to academia. Jeremy posed the problem of determining when social media results are actually true. Matthew wants to see more interdisciplinary work between computer scientists and social scientists. Sihem offered two challenge problems: scalable community discovery and evaluation of collaborative recommendation systems. William wants to see a rigorous axiomatization of social media search behavior.

After lunch, Jeremy Pickens (FXPAL) moderated a panel representing social media / networking companies: Hilary Mason (bit.ly), Igor Perisic (LinkedIn), and David Hendi (MySpace). Hilary noted that, while bit.ly does not have access to an explicit social graph, it captures implicit connections from user behavior that may not be represented in the graph. Jeremy asked the panelists how much a person's extended network matters; David and Igor pointed out research indicating correlations of mood and even medical conditions between people and their third-degree connections. Again, the audience was full of questions, especially for Igor. As a fan of faceted search, I was glad to see him touting LinkedIn's success in making faceted search the primary means of performing people search on the site. For an in-depth view, I recommend "LinkedIn Search: A Look Beneath the Hood".

The afternoon continued with a poster / demo session emphasizing work in progress: tools, interfaces, research studies, and position papers. I particularly enjoyed listening to the stream of interaction between academic researchers and industry practitioners.

The final panel session assembled academic researchers to discuss their views of the challenges in social media. Gene Golovchinsky (FXPAL) moderated a panel comprised of Meena Nagarajan (Wright State University), Liangjie Hong (Lehigh University), Richard McCreadie (University of Glasgow), Jonathan Elsas (CMU), and Mor Naaman (Rutgers University). Meena highlighted the need to build up meta-data to describe the context around social utteracnces. Liahjie took a position similar to William Cheng's, calling for a framework to model the tasks and behavior of users who interact with social media. Richard focused on the intersection of social media and news search, and noted that some of the most useful information is private and proprietary (e.g., search and chat logs). Jonathan offered a variety of challenges: determining the right retrieval granularity, managing multiple axes of organization, aggregating author behavior, and multidimensional indexing of social media content. Finally, Mor noted that we're moving from a world of email to a "social awareness stream", in which the content we directed content at a group and have lower expectations of readership than email. As with all of the panels, there were countless questions from the moderator and audience, particularly about determining the truthfulness of social media content and delivering social content in an effective user interface.

The final conference session was a conference was a full-group discussion that dived into the various topics addressed throughout the day. But Gene Golovchinsky provided the "one more thing" at the end, showing us a glimpse of a faceted search interface to explore a Twitter stream. It was an elegant finish to a day filled with informative and engaging discussion, and I look forward to seeing many of the participants in the WSDM conference over the next few days.