BLOG@CACM
Artificial Intelligence and Machine Learning

Alphago Is Not the Solution to AI

Posted
John Langford
John Langford, Microsoft Research New York

Congratulations are in order for the folks at Google Deepmind who have mastered Go.

However, some of the discussion around this seems like giddy overstatement. Wired says Machines have conquered the last games and Slashdot says We know now that we don’t need any big new breakthroughs to get to true AI. The truth is nowhere close.

For Go itself, it has been well-known for a decade that Monte Carlo tree search (i.e., valuation by assuming randomized playout) is unusually effective in Go. Given this, it is unclear that the AlphaGo algorithm extends to other board games where MCTS does not work so well. Maybe? It will be interesting to see.

Delving into existing computer games, the Atari results (see figure 3) are very fun but obviously unimpressive on about ΒΌ of the games. My hypothesis for why is that their solution does only local (epsilon-greedy style) exploration rather than global exploration so they can only learn policies addressing either very short credit assignment problems or with greedily accessible polices. Global exploration strategies are known to result in exponentially more efficient strategies in general for deterministic decision process (1993), Markov Decision Processes (1998), and for MDPs without modeling (2006).

The reason these strategies are not used is because they are based on tabular learning rather than function fitting. That is why I shifted to Contextual Bandit research after the 2006 paper. We have learned quite a bit there, enough to start tackling a Contextual Deterministic Decision Process, but that solution is still far from practical. Addressing global exploration effectively is only one of the significant challenges between what is well known now and what needs to be addressed for what I would consider a real AI.

This is generally understood by people working on these techniques but seems to be getting lost in translation to public news reports. That is dangerous because it leads to disappointment. The field will be better off without an overpromise/bust cycle, so I would encourage people to keep and inform a balanced view of successes and their extent. Mastering Go is a great accomplishment, but it is quite far from everything.

Edit: Further discussion here.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More