In Support of Open Reviews; Better Teaching Through Large-Scale Data Mining

http://cacm.acm.org/blogs/blog-cacm/100030
October 20, 2010

At ECSS 2010, the annual meeting of Informatics Europe, we heard a fascinating keynote by Moshe Vardi, editor-in-chief of Communications of the ACM, titled "The Tragedy of the Computing-Research Commons." Professor Vardi talked about the importance of engaging in activities that benefit the community even if they bring no huge immediate reward to the individuals who participate in them. He lamented the degradation of the computer science culture due, among other causes, to shoddy refereeing practices.

Lamenting about the reviewing process is common, and everyone has horror stories. Yet the simple solution is almost never considered: Turn the de fault refereeing mode to open, rather than anonymous.

BERTRAND MEYER: "Refereeing should be what it was before science publication turned into a business: scientists giving their polite but frank opinion on the work of other scientists."

Some cases may still justify anonymity, but they should be the exception, calling for a specific justification. Refereeing should be what it was before science publication turned into a business: scientists giving their polite but frank opinion on the work of other scientists. Anonymity just encourages power games, back stabbing, and, worst of all, poor quality: Since no one can call your bluff, you are not encouraged to be a good referee. Of course, many people do an excellent job anyway, but they do not necessarily prevail. In the highly competitive world of computer science publications—conference publication, in particular, with its schedule pressures—one incompetent but arrogant negative review typically outweighs five flattering and carefully considered analyses.

By revealing who you are, you force yourself to write reviews that you can defend.

More than two decades ago I started refusing to do anonymous reviews. This stance may not have only brought me new friends (which may not be a big deal as I am not sure people who hate you because you found flaws in one of their papers are worth having as scientific friends), but it has certainly made me a better reviewer. In fact, it did bring me "some" friends—people who are grateful for having gained new insights, positive or negative, into their own work.

A more complete discussion and rationale can be found in this Web page, http://se.ethz.ch/~meyer/publications/online/whysign, to which I regularly refer editors asking for reviews. That text, written several years ago, is verbose and should be rewritten, but it does include the basic analysis.

The decision to perform open refereeing was personal and, until now, I have always refrained from proselytizing. Seeing the degradation in refereeing, however, I believe such reserve is no longer appropriate. Establishing open refereeing as the default strategy is the first step toward fixing the flawed culture of computer science refereeing.

Greg Linden "Massive-Scale Data Mining for Education"

http://cacm.acm.org/blogs/blog-cacm/101489
November 10, 2010

Let’s say, in the near future, tens of millions of students start learning math using online computer software. Our logs fill with a massive new data stream, millions of students doing billions of exercises, as the students work.

In these logs, we will see some students struggle with some problems, then overcome them. Others will struggle with those same problems and fail. There will be paths of learning in the data, some of which quickly reach mastery, others of which go off in the weeds.

At Amazon.com a decade ago, we studied the trails people made as they moved through our Web site. We looked at the probability that people would click on links to go from one page to another. We watched the trails people took through our site and where they went astray. As people shopped, we learned how to make shopping easier for others in the future.

GREG LINDEN: "Let’s say we have massive new logs of what these students are doing and how well they are doing. What would a big Internet company do with this data?"

Similarly, Google and Microsoft learn from people using Web search. When people find what they want, Google notices. When other people do that same search later, Google has learned from earlier searchers, and makes it easier for the new searchers to get where they want to go.

GREG LINDEN: "Teachers might think one concept should always be taught before another, but what if the data shows us different? What if we reorder the problems and students learn faster?"

Beyond a single search, the search giants watch what people look for over time as they do many searches—what they eventually find or whether they find nothing, where they navigate to after searching—and learn to push future searchers onto the more successful paths trod by those before them.

So, let’s say we have millions of students learning math on computers. Let’s say we have massive new logs of what these students are doing and how well they are doing. What would a big Internet company do with this data? What would be the Googley thing to do with these logs? What would massive-scale data mining look like for students?

We could learn that students who have difficulty solving one problem would have trouble with another. For example, perhaps students who have difficulty with the problem (3x − 7 = 3) have difficulty with (2x − 13 = 5).

We could then learn of clusters of problems that will be diffcult for someone to solve if they have the same misunderstanding of an underlying concept. For example, perhaps many students who cannot solve (3x − 7 = 3) and similar problems are confused about how to move the −7 to the other side of the equation.

Also, we could discover the problems in that cluster that are particularly likely to teach that concept well, to break students out of the misunderstanding and then be able to solve all the problems they previously found so difficult. For example, perhaps students who have difficulty with (3x − 7 = 3) and similar problems are usually able to solve that problem when presented first with the easier problems (x − 5 = 0) and (2x − 3 = 1).

Then we could learn paths through clusters of problems that are particularly effective and rapid for students. Teachers might think one concept should always be taught before another, but what if the data shows us different? What if we reorder the problems and students learn faster?

We could even learn personalized and individualized paths for effective and rapid learning. Some students might start on a generic path, show early mastery, and jump ahead. Others might struggle with one type of problem or another. Each time a student struggles, we will try them on problems that might be a path for them to learn the underlying concepts and succeed. We will know these paths because so many others struggled before, some of which found success.

As we experiment, as millions of students try different exercises, we forget the paths that consistently led to continued struggles, remember the ones that lead to rapid mastery, and, as new students come in, we put them on the successful paths we have seen before.

It would be student modeling on a heretofore unseen scale. From tens of millions of students, we automatically learn tens of thousands of models, little trails of success for future students to follow. We experiment, try different students on different problems, discover which exercises cause similar difficulties, and which exercises help students break out of those diffculties. We learn paths in the data and models of the students. We learn to teach.

In Support of Open Reviews; Better Teaching Through Large-Scale Data Mining

Greg Linden "Massive-Scale Data Mining for Education"

In Support of Open Reviews; Better Teaching Through Large-Scale Data Mining

DOI

November 2011 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Greg Linden "Massive-Scale Data Mining for Education"

In Support of Open Reviews; Better Teaching Through Large-Scale Data Mining

DOI

November 2011 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.