A series of recent reports claim the U.S. education system is in a very severe crisis; others suggest the crisis is “overblown.” On the one hand, the National Academies released a report “Rising Above the Gathering Storm, Revisited: Rapidly Approaching Category 5,”6 which argued that the U.S. economy is at risk because innovation will suffer due to poor-quality science education. The President’s Council of Advisors on Science and Technology (PCAST) stated in its report “Prepare and Inspire: K12 Education in Science, Technology, Engineering, and Math (STEM) for America’s Future”8 that there are “troubling signs” about U.S. STEM education. In particular, the Council of Advisors’ report called out the importance of knowing about computing, for example, they say “a basic understanding of technology and engineering is important if our children are to contribute to and compete in a rapidly changing society and an increasingly interconnected global community.”
On the other hand, an essay from Nicholas Lemann in a recent issue of The New Yorker referred to the crisis in American education as “overblown.”3 Lemann points out that the American system of mass higher education is “one of the great achievements of American democracy.” In September, a New York Times article pointed to rising unemployment in the technology sector, suggesting that maybe we have too many computing graduates.7
All of these reports might be right. An explanation that satisfies all these claims is that we are educating large numbers of students, as Lemann suggests, but not well enough to address the needs described in the National Academies and PCAST reports. The unemployed technology workers described by the New York Times may not have the right skills or knowledge to get the jobs that will fuel innovation.
Computing education research has a role to play here. If these reports are right, we need to produce more graduates with a higher level of knowledge and skill. Computing education research can help us figure out where the shortcomings are in the U.S. education system, and how to address them.
The Sorry State of CS1
The introductory course in computer science in higher education is often referred to as “CS1” from early ACM and IEEE curriculum volumes. One of the first efforts to measure performance in CS1 was in a series of studies by Elliot Soloway and his colleagues at Yale University. They regularly used the same problem, called “The Rainfall Problem”: Write a program that repeatedly reads in positive integers, until it reads the integer 99999. After seeing 99999, it should print out the average. In one study, only 14% of students in Yale’s CS1 could solve this problem correctly.9 The Rainfall Problem has been used under test conditions and as a take-home programming assignment, and is typically graded so that syntax errors don’t count, though adding a negative value or 99999 into the total is an automatic zero. Every study that I’ve seen (the latest in 2009) that has used the Rainfall Problem has found similar dismal performance, on a problem that seems amazingly simple.
Mike McCracken realized the problem with Soloway’s studies, or any similar study, could be that a single campus could get it wrong. Maybe Yale just taught CS1 badly. McCracken wanted to find problems that students might be having in general with CS1. He organized a multi-institutional, multinational (MIMN) study, with student data on the same problem collected from four institutions in three different countries.5 One place might get it wrong, use the “wrong” language, or use “objects-first” when they ought to do “objects-later” (or some other pedagogical trade-off). Studying a range of schools helps us to describe “traditional” teaching of that subject, and student outcomes from that teaching. McCracken’s group asked students to evaluate arithmetic expressions where the numbers and operations appeared in a text file (prefix, postfix, or infix—student’s choice). 215 CS1 students participated in the study. The average score was 21%. Many of the participants never got past the design part of the problem to write any code at all.
Raymond Lister thought that maybe McCracken’s study was asking students to do too much, in that they were designing solutions and implementing programs. He organized another MIMN study, this time where they asked students to read and trace code. 556 students from six institutions across seven countries completed 12 multiple-choice questions involving iteration and array manipulation. The average score was 60%; 23% of the students were only able to get four or fewer problems correct.
Computing education research can help us figure out where the shortcomings are in the U.S. education system, and how to address them.
The most recent evaluation of CS1 is in the dissertation by Allison Elliott Tew, whom I advised.10 Elliott Tew has been interested in how we can compare performance between different kinds of CS1, especially where the language varies. She hypothesized that students could take a test written in a pseudocode, especially designed to be easy to read, that would correlate well with how students performed in whatever their “native” CS1 language was. Before she created her test, though, she had to define what we mean by “CS1.”
Since Elliott Tew wanted her test to be usable in a variety of different kinds of classes, she tried to define a small subset of what different people saw as “CS1 knowledge.” Elliott Tew looked at popular CS1 textbooks to define the intersection set of topics between those, and used the ACM/IEEE curricular volumes to identify only those topics recommended for CS1. In the end, she defined a very small subset of what anyone teaches in CS1 as the “CS1 knowledge” that she would test.
Elliott Tew created an exam with 27 questions in each of MATLAB, Python, Java, and her pseudocode. Each of her 952 subjects, from three institutions in two countries, completed two exams: One in her pseudocode, and one in their “native” language. She found that the correlation was very high between the pseudocode and the “native” language, and additionally, the correlation was very high between the pseudocode and the students’ final exam grade in CS1.
Elliott Tew’s results make a strong case that pseudocode can be used effectively in a language-independent test for CS1 knowledge and that her test, in particular, is testing the same kinds of things that CS1 teachers are looking for on their final exams. But the relevant finding is that the majority of her 952 test-takers failed both of her exams, based on a small subset of what anyone teaches in CS1. The average score on the pseudocode exam was 33.78%, and 48.61% on the “native” language exam.10
These four studies4,5,8,10 paint a picture of a nearly three-decades-long failure to teach CS1 to a level that we would expect. They span decades, a variety of languages, and different teaching methodologies, yet the outcomes are surprisingly similar. Certainly, the majority of students do pass CS1, but maybe they shouldn’t be passing. Each of these four efforts to objectively measure student performance in CS1 has ended with the majority of students falling short of what we might reasonably considerable passable performance. The last three studies,4,5,10 in particular, have each attempted to define a smaller and smaller subset of what we might consider to be CS1 knowledge. Yet performance is still dismal. We have not yet found a small enough definition of CS1 learning outcomes such that the majority of students achieve them.
There are a lot of possible explanations for why students perform so badly on these measures. These studies may be flawed. Perhaps students are entering the courses unprepared for CS1. Perhaps our expectations for CS1 are simply too high, and we should not actually expect students to achieve those learning objectives after only a single course. Perhaps we just teach CS1 badly. I argue that, regardless of explanation, these four studies set us up for success.
From Science to Engineering
In 1985, Halloun and Hestenes published a careful study of their use of the Mechanics Diagnostics Test, later updated as the Force Concept Inventory.2 The Force Concept Inventory (FCI) gave physics education researchers a valid and reliable yardstick by which to compare different approaches to teaching physics knowledge about force. The traditional approach was clearly failing. Hestenes reported that while “nearly 80% of the students could state Newton’s Third Law at the beginning of the course…FCI data showed that less than 15% of them fully understood it at the end.”
We need to develop better ways of teaching computer science, like the physics educators’ interactive-engagement methods.
FCI as a yardstick was the result of physics education research as science. Scientists define phenomena and develop instruments for measuring those phenomena. Like computer scientists, education researchers are both scientists and engineers. We not only aim to define and measure learning—we develop methods for changing and improving it.
Physics education researchers defined a set of new methods for teaching physics called interactive-engagement methods. These methods move away from traditional lecture, and instead focus on engaging students in working with the physics content. In a study with a stunning 6,000-plus participants interactive-engagement methods were clearly established to be superior to traditional methods for teaching physics.1 Once the yardstick was created, it was possible to engineer new ways of teaching and compare them with the yardstick.
The demands of the “Rising Above the Gathering Storm” and PCAST reports call on computing education researchers to be engineers, informed by science. These four studies establish a set of measures for CS1 knowledge. There are likely flaws in these measures. More and better measures can and should be developed. There is much that we do not know about how students learn introductory computing. There is plenty of need for more science.
But even with just these studies, we have significant results showing that our current practice does not measure up to our goals. We have four yardsticks to use for measuring progress toward those goals. Even doing better against these yardsticks would be improving our teaching methods of computer science. Our challenge is to do better. We need to develop better ways of teaching computer science, like the physics educators’ interactive-engagement methods. We need to publish our better methods and demonstrate successful performance against common yardsticks.
Computing education researchers have proven themselves as scientists. The next challenge is to prove we can also be engineers. Computing education needs to build on the science and show measurable progress toward the needs identified by that science.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment