Learning Programming at Scale

I just finished my third year as an assistant professor, which is roughly the halfway point before applying for tenure. This seems like a good time to take a step back and reflect on the research that I’ve done over the past three years with my students (from both UC San Diego and the University of Rochester) and other collaborators. In this article, I’ll summarize all of the papers published from our work during this time period and show how they all fit together.

My research over the past three years centers on a term that I coined in 2015 called learning programming at scale. It spans the academic fields of human-computer interaction, online learning, and computing education.

Decades of prior research have worked to improve how computer programming is taught in traditional K-12 and university classrooms, but the vast majority of people around the world—children in low-income areas, working adults with full-time jobs, the fast-growing population of older adults, and millions in developing countries—do not have access to high-quality classroom learning environments. Thus, the central question that drives my research is: How can we better understand the millions of people from diverse backgrounds who are now learning programming online and then design scalable software to support their learning goals? To address this question, I study learners using both quantitative and qualitative research methods and also build new kinds of interactive learning systems.

All of my publications since I started as an assistant professor fit into one of three major research directions within learning programming at scale:

Understanding why and how people from diverse backgrounds are learning programming
Designing new kinds of programming environments to support learners
Designing new formats for programming-related instructional materials

Now I’ll give a whirlwind tour of these publications. Click on any of the links below to see their abstracts and to download the full PDFs from my website.

1. Understanding why and how people from diverse backgrounds are learning programming

One critical prerequisite for improving how programming is taught is to understand why and how people are currently learning and what obstacles they face. To work toward this goal, I have been studying traditionally-underrepresented learner populations and understudied settings for learning outside of the classroom.

I recently studied how older adults aged 60 and over are learning programming [Guo2017] (CACM blog post). I found that they were often motivated by age-related reasons such as keeping their brains challenged as they aged, making up for missed learning opportunities during their youth, connecting with younger family members, and improving their job prospects. They reported a variety of age-related frustrations such as a perceived decline in cognitive abilities, lack of social opportunities to interact with tutors and peers, and trouble dealing with constantly-changing software technologies.

With Chris Parnin and his students, we studied the unique challenges faced by female programmers when seeking and providing help on the popular Stack Overflow question-and-answer website [Ford2016]. We found five participation barriers that affected women more than men: 1) not being aware of certain features of the site, 2) not feeling qualified enough to chime in with questions and answers, 3) being intimidated by the large size of the online community, 4) discomfort from interacting with strangers online, and 5) fear of appearing like they are slacking on the job by spending time on that website.

My student Jeremy Warner and I also studied the recent phenomenon of hackathons where college students gather for 24-to-36-hour periods to learn coding by creating software prototypes [Warner2017hack]. We found that the time-limited format of hackathons generated excitement and focus and that learning occurred incidentally, opportunistically, and from peers. However, some students were discouraged from attending by perceptions of an overly competitive climate, an unwelcoming culture, and fears of not having enough prior experience; these factors were reported more frequently by women than by men.

Finally, Parmit Chilana and I identified and studied an emerging population of college students [Chilana2015] and professionals at technology companies [Chilana2016] (joint work with Rishabh Singh) who want to learn programming but do not actually need to write code for their jobs. We call these people conversational programmers since their main goal is to learn just enough about programming to be able to hold productive technical conversations with programmers.

2. Designing new kinds of programming environments to support learners

Current programming environments are designed to maximize the productivity of professionals who are already experts. Instead, I’ve been creating new environments to address the unique challenges faced by novice programmers, which can hopefully ease their path to eventually becoming experts.

First I built a series of tools to help novices overcome a fundamental barrier to learning programming: understanding what happens "under the hood" as the computer runs each line of source code. These tools are all built on top of the Python Tutor web-based programming environment (pythontutor.com) that I created in 2010 [Guo2013]. Python Tutor (despite its outdated name!) lets users write code in languages such as Python, Java, C, C++, JavaScript, TypeScript, and Ruby; it runs the user’s code and automatically visualizes what the computer is doing step-by-step. So far, over 3.5 million people from over 180 countries have used Python Tutor to understand and debug their code, which provides a large global user base to test the new tools that are built on top of it.

I extended Python Tutor with a real-time collaborative mode called Codechella [Guo2015chella], which lets multiple users connect to the same visualization session and work together to solve programming problems and tutor one another. I followed up on these ideas with Codeopticon [Guo2015opticon], a real-time activity monitoring dashboard that allows a single tutor to simultaneously watch dozens of people working on the Python Tutor website and jump in to tutor multiple learners at once. My student Hyeonsu Kang and I then morphed Python Tutor into Omnicode [Kang2017], a live programming environment that continually visualizes the entire history of all numerical program values to give programmers a bird’s-eye view of execution. Finally, with Chris Parnin and our student Ian Drosos, we created HappyFace [Drosos2017], a medically-inspired pain scale embedded into Python Tutor to let users self-report their frustration levels.

The second set of challenges that I tackle here relates to the fact that novices have trouble installing, configuring, and managing the complex array of software tools required to become productive as programmers. I coined a term called command-line BS-ery to lovingly refer to these sources of extrinsic complexity that demoralize novices. To help remove these complexities, Jeremy Warner and I created CodePilot [Warner2017pilot], a programming environment that lets novices quickly get started with pair programming and test-driven development by integrating real-time collaborative coding, testing, bug reporting, and version control management into a single unified system. In a similar vein, my student Xiong Zhang and I created DS.js [Zhang2017], a bookmarklet that lets novices get started learning data science by writing code to analyze structured data directly on any existing webpage instead of needing to download data sets and configure data analysis software on their own computers.

3. Designing new formats for programming-related instructional materials

My third major research direction involves studying the shortcomings of existing formats for programming-related instructional materials and then designing new instructional formats that improve the user experience for both creators and consumers of those materials.

In collaboration with Brad Miller and his student, Jeremy Warner and I studied how people used a popular digital textbook for learning programming, How to Think Like a Computer Scientist: Interactive Edition [Warner2015]. We found that the ability to execute code and see it visualized using Python Tutor directly within the context of textbook lessons was especially popular amongst readers.

My students (led by Joyce Zhu) and I analyzed all of the discussion forum messages in a popular MOOC, Introduction to Computer Science and Programming Using Python [Zhu2015]. We found that people often wanted to discuss run-time code execution state but had lots of trouble doing so since forums are purely text-based. From these findings, we propose that a better discussion forum for learning programming should integrate automatically-generated visualizations of execution state and enable inline annotations of source code and output.

Inspired by the above work, my student Mitchell Gordon and I created Codepourri [Gordon2015], a new tutorial format and crowdsourcing workflow that lets Python Tutor users work together to create step-by-step tutorials by directly annotating run-time code visualizations. Since there are far more learners than experts, using learners as a volunteer crowd of workers is a potentially more scalable way to create coding tutorials than relying solely on experts. We found that crowd-created tutorials for simple code were accurate, informative, and even contained some insights that expert instructors missed.

Finally, back to eliminating command-line BS-ery, my student Alok Mysore and I created Torta [Mysore2017], a macOS app that allows users to create step-by-step tutorials that span multiple command-line and GUI applications by simply demonstrating the intended actions on their computers. The Torta system: a) automatically records a screencast video along with operating system events such as filesystem modifications, file diffs, and process invocations, b) generates a new kind of mixed-media tutorial with the benefits of both video and text formats, and c) gives step-by-step feedback to people who are following the tutorial and even automatically runs certain steps for them.

That’s all for now, folks. In the coming years, I plan to keep building momentum along these three directions and then gradually expand outward. Stay tuned!

References

[Guo2017] Philip J. Guo. Older Adults Learning Computer Programming: Motivations, Frustrations, and Design Opportunities. ACM Conference on Human Factors in Computing Systems (CHI), 2017. (Honorable Mention Paper Award)
[Warner2017pilot] Jeremy Warner and Philip J. Guo. CodePilot: Scaffolding End-to-End Collaborative Software Development for Novice Programmers. ACM Conference on Human Factors in Computing Systems (CHI), 2017.
[Warner2017hack] Jeremy Warner and Philip J. Guo. Hack.edu: Examining How College Hackathons Are Perceived By Student Attendees and Non-Attendees. ACM International Computing Education Research conference (ICER), 2017.
[Mysore2017] Alok Mysore and Philip J. Guo. Torta: Generating Mixed-Media GUI and Command-Line App Tutorials Using Operating-System-Wide Activity Tracing. ACM Symposium on User Interface Software and Technology (UIST), 2017.
[Zhang2017] Xiong Zhang and Philip J. Guo. DS.js: Turn Any Webpage into an Example-Centric Live Programming Environment for Learning Data Science. ACM Symposium on User Interface Software and Technology (UIST), 2017. (Honorable Mention Paper Award)
[Kang2017] Hyeonsu Kang and Philip J. Guo. Omnicode: A Novice-Oriented Live Programming Environment with Always-On Run-Time Value Visualizations. ACM Symposium on User Interface Software and Technology (UIST), 2017.
[Drosos2017] Ian Drosos, Philip J. Guo, Chris Parnin. HappyFace: Identifying and Predicting Frustrating Obstacles for Learning Programming at Scale. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), 2017.
[Ford2016] Denae Ford, Justin Smith, Philip J. Guo, Chris Parnin. Paradise Unplugged: Identifying Barriers for Female Participation on Stack Overflow. ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE), 2016.
[Chilana2016] Parmit K. Chilana, Rishabh Singh, Philip J. Guo. Understanding Conversational Programmers: A Perspective from the Software Industry. ACM Conference on Human Factors in Computing Systems (CHI), 2016.
[Chilana2015] Parmit K. Chilana, Celena Alcock, Shruti Dembla, Anson Ho, Ada Hurst, Brett Armstrong, Philip J. Guo. Perceptions of Non-CS Majors in Intro Programming: The Rise of the Conversational Programmer. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), 2015.
[Guo2015opticon] Philip J. Guo. Codeopticon: Real-Time, One-To-Many Human Tutoring for Computer Programming. ACM Symposium on User Interface Software and Technology (UIST), 2015.
[Guo2015chella] Philip J. Guo, Jeffery White, Renan Zanelatto. Codechella: Multi-User Program Visualizations for Real-Time Tutoring and Collaborative Learning. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), 2015.
[Gordon2015] Mitchell Gordon and Philip J. Guo. Codepourri: Creating Visual Coding Tutorials Using A Volunteer Crowd Of Learners. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), 2015.
[Zhu2015] Joyce Zhu, Jeremy Warner, Mitchell Gordon, Jeffery White, Renan Zanelatto, Philip J. Guo. Toward a Domain-Specific Visual Discussion Forum for Learning Computer Programming: An Empirical Study of a Popular MOOC Forum. IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), 2015.
[Warner2015] Jeremy Warner, John Doorenbos, Bradley N. Miller, Philip J. Guo. How High School, College, and Online Students Differentially Engage with an Interactive Digital Textbook. International Conference on Educational Data Mining (EDM), short paper, 2015.
[Guo2013] Philip J. Guo. Online Python Tutor: Embeddable Web-Based Program Visualization for CS Education. ACM Technical Symposium on Computer Science Education (SIGCSE), 2013.

Learning Programming at Scale