Sign In

Communications of the ACM

ACM News

'Caesar' Conquers Code Review With Crowdsourcing

code review, illustration

Computer science teaching assistants have long struggled with the reams of code students generate weekly to fulfill homework assignments. Grading a course section of 10 to 15 students at one hour per student can be grueling. "It was awful. Awful," says Michael Bolin, a software engineer at Facebook, of his days as a TA for an MIT software engineering course in the mid-2000s. "It was awful for the students, too, because the teaching staff turnaround time was so long. We put it off 'til the last possible minute because it was so painful. And that meant they might not get feedback they needed before they did their next problem set."

Now a new system to facilitate code review, called Caesar, has been developed at MIT's Computer Science and Artificial Intelligence Lab (CSAIL). The brainchild of Department of Electrical Engineering and Computer Science Associate Professor Rob Miller and former graduates students Mason Tang and Elena Tatarchenko, Caesar automatically divides and conquers the problem of reviewing the weekly code output of hundreds of students taking Miller's 6.005 Elements of Software Construction and the school's 6.172 Performance Engineering of Software Systems courses.

The system has three components: the code selector, task router, and reviewing interface. Once students have submitted an assignment, the code selector breaks their work into chunks, or classes, and prioritizes those that need review based on features of the code that suggest it will need attention. The task router then assigns these chunks to a diverse crowd of reviewers.

Reviewers include MIT alumni and the students, themselves. At the end of the week their assignment is due, students receive feedback on how to make their code clearer, easier to maintain, and less susceptible to bugs. Typically, 60 alumni per semester who took the same course, or one of its predecessors, including Bolin from Facebook, act as reviewers. But the 200 students do most of the reviews.

Such crowdsourcing complements automatic testing for correctness and clarity, says Miller, but "it's a social process and not just looking at someone else's code in isolation. Students benefit from comments by five to seven reviewers with different levels of expertise."

The system "orchestrates small contributions from lots of people," Miller says. "It helps to have lots of eyeballs when under time constraints and they contribute different ideas." Reviewers make comments in the margin, and comments are threaded as they are in Facebook and GoogleDocs. Extending both the imperial and social networking metaphor, the course staff alerts students to particularly salient reviewer comments by giving them a thumbs up or thumbs down.

MIT junior Ido Efrati says his understanding of a user interface concept improved because of a reviewer's comment on his code for a game called Jotto. "When I first coded my problem set I did not have a good understanding of the Model-View-Controller (MVC) and did not separate my code into three classes," he says. "This resulted in very long and hard to debug software. One comment I got on my code review gave me a better understanding of MVC and how to split and improve my code. This helped me debug it easily."

Tang, who is now a software engineer at Google, says it is interesting to see the creativity and variety of solutions students develop on their own. While it's important to conform to a coding style at least in terms of tedious details, like where to put characters, he would like to see creativity encouraged. "Caesar is a great way for reviewers to see lots of solutions and call out some novel approach that maybe the professor can discuss in the next lecture," he says.

Miller says implementing Caesar involves tradeoffs, however. A crowd of diverse reviewers criticizes small chunks of code line-by-line, but because they don't see an entire program, as TAs in the past did, they do not make global recommendations. "There are things we give up when moving to small-scale crowdsourcing at this level. Reviewers look at how a student writes code, which is comparable to judging the quality of sentences and paragraphs versus the plot of book,’’ he says.

Bolin volunteered to review students' code because he wants to help undergraduates develop skills for programming in-the-large, the approach described by DeRemer and Kron and popular throughout industry. Unless they've interned at a company, undergraduates don't realize that industrial code must be maintained after its author leaves, he says. Google and Facebook, for example, have mandatory pre-commit code reviews, which means a teammate must review and approve code. "Many new employees think there's lots of overhead for submitting code," says Bolin. "Having code reviewed and being forced to read other peoples' code is a skill most don't have just because of lack of experience. They would be better off if they have that experience." MIT and good computer science schools focus mainly on theory, Bolin says. "It's hard because many professors haven't spent much time in industry and don't always appreciate that side, hands-on experience, and how much it matters."

Many companies have their own code reviewing systems, or they customize the open tools of Google or Facebook, which are available on code-sharing site, but open tools don't usually offer a way to find capable reviewers or those with a particular expertise. Also, Miller says it is more efficient for one grader to focus on a single question in a quiz than to grade an entire quiz. Miller is working on features that will enable Caesar to assign reviewers to specific code, and if they find a mistake will identify other code they weren't assigned to review that contains the same mistake. Caesar would then also send the review to the student who authored that code. 

Miller also hopes industry will use Caesar to teach coding. "I'm optimistic that what we learn from crowd code reviewing in classes about task routing, social reviewing, and quality control, will provide both principles and mechanisms to help make learning-by-code-reviewing into a lifetime industry practice," he says.

Karen A. Frenkel lives in New York City and covers technology, innovation, and entrepreneurs.




No entries found