The sports industry is big business globally and domestically;10 for example, the New York Knicks of the National Basketball Association (NBA) generated $287 million in revenue in 2013.2 In order for sports organizations to maximize their financial performance, they must win on the court. The operational staffs, including coaches and general managers, must consistently make the right decisions despite many constraints, including a league-imposed salary cap and team budgets. Sports analytics plays an increasingly important role in such decisions.
Sports analytics traditionally involves statistical techniques for analyzing historical player performance. General managers have used it to build their rosters and coaches have used it in conjunction with their domain knowledge to adjust lineups and improve players' on-court performance. Though ongoing sports analytics research and practices center mostly on the structured data of player profiles and historical performance,1 this article explores the extent NBA teams can use "unstructured" social media data to further their sports analytics efforts. This novel focus is motivated by the prevalence of social media analytics in all kinds of business domains over the past five years. Specifically, our objective is to show how NBA players' pre-game emotional state, as captured through their tweets, or the messages they post on Twitter, before a game can help predict on-court performance in the game.
The following letter was published in the Letters to the Editor of the February 2016 CACM (http://cacm.acm.org/magazines/2016/2/197433).
The article "Hidden In-Game Intelligence in NBA Players' Tweets" by Chenyan Xu et al. (Nov. 2015) lacked, in my opinion, a complete understanding of the topics it covered. The measures it cited were not adequately reported; for example, not clear was what the dependent variable consisted of, so readers were unable to judge what the coefficients mean or the adequacy of a 1% adjusted R2 in Table 5, an effect size that was most likely meaningless.
Moreover, the sample size was not explained clearly. There were initially 91,659 tweets in the sample, and 266 players tweeted at least 100 tweets during the season in question. Other than in a small note in Table 1, the article did not mention there are only 82 games in a regular NBA season, resulting in at least 1.22 tweets per game for those 266 players; this is not an appropriate sample size, and the distribution is most likely a long tail. With 353 players tweeting and 82 games, the sample size should be 28,946 player-games (the unit of analysis), yet the reported sample size was a fraction of that 3,443 or 3,344. That would be fewer than 10 games per player and not an adequate sample size.
Also unclear was if players with more tweets before a game can have higher emotion scores, as this measure seems to be an aggregate; the article said, "The total score represents a player's mood . . . The higher the aggregated score, the more positive the player's mood," emphasis added. More tweets do not mean more emotion. The article also did not address if there is a difference between original tweets and replies to other tweets.
The article also made a huge assumption about the truthfulness of tweets. NBA players are performers and know their tweets are public. The article dismissed this, saying, "Its confounding effect is minimal due to players' spontaneous and genuine use of Twitter," yet offered no evidence, whether statistical, theoretical, or factual, that this is so.
The article coded angry emoticons (such as >:-o) to the negative mood condition, as in Table 3. This emoticon-mood mapping is incorrect, as anger can be positively channeled into focus and energy on the court. Smileys and frowns were given a weighting of +/2 on a scale of +5 to 5, but not explained was why this is theoretically defensible.
NBA coaches do not seek to maximize performance at the level of an individual player but at the level of a team as a whole across an entire game and season. Bench players usually cannot replace starting players; the starters start for very good reasons.
The article's conclusion said the authors had analyzed 91,659 tweets, yet footnote b said, "Of the 51,847 original posts, 47,468 were in English," implying they analyzed at most 87,280 tweets. Restating the number 91,659 was itself misleading, as tweets were not the unit of analysis player-games were and the authors had only 3,443 such observations, at most.
The one claim reviewers and editors should definitely have caught is in footnote c: A metric that can capture the unquantifiable? I am so speechless I might have to use an emoticon myself.
Our study explored whether and how NBA players' tweets can be used to extract information about players' pre-game emotional state (X) based on the psychology and sports literature and how it might affect players' in-game performance (Y). To generate X for a player before a game, we purged pure re-tweets, information-oriented tweets, and non-English tweets. Based on the remaining valid tweets, we then extracted, aggregated, and normalized the data, as in Table 5. We still find it intriguing X explains up to 1% of the total variations in Y, whereas other standard variables explain only 4%.
Chenyan Xu, Galloway, NJ
Yang Yu, Rochester, NY
Chun K Hoi, Rochester, NY
Displaying 1 comment