The Inverse Lake Wobegon Effect

Every episode of the long-running radio variety show A Prairie Home Companion includes a segment where host Garrison Keillor tells stories from his mythical hometown, Lake Wobegon. Each of these segments ends with the sentence, "Well, that’s the news from Lake Wobegon, where all the women are strong, all the men are good looking, and all the children are above average." That notion, that "all the children are above average," is an example of what’s now known as the Lake Wobegon Effect (see description here), also known as "illusory superiority."

The Lake Wobegon Effect is where we consider the small sample in our experience as being superior to the population overall. A concrete example is that 80% of drivers consider themselves to be above-average drivers. Obviously that can’t be true, but it’s common that we think what we experience is above average.

The Inverse Lake Wobegon Effect is a term I’m coining for a fallacy that I see sometimes in computer science education. The Inverse Lake Wobegon Effect is when we sample from a clearly biased source and assume the sample describes the overall population. We know we’re observing a superior sample, but act like we’re getting a randomly distributed sample. This is a form of sampling bias.

I introduce the term in a new book that I just published with Morgan & Claypool, Learner-Centered Design of Computing Education: Research on Computing for Everyone (see blog post on book). One example of the Inverse Lake Wobegon Effect in CS Ed is assuming that a successful undergraduate introductory curriculum will be similarly successful in high school. Students in undergraduate education are elite. In the United States, undergraduates are screened in an application process and are certainly in the top-half of most scales (e.g., intellectual achievement and wealth). Elite students can learn under conditions that more average students might not succeed at, what educators call aptitude-treatment interactions (see description here).

Consider Bennedsen and Caspersen’s work on predictors for success in introductory computing (see ACM DL link). Students in undergraduate education have better math grades and have more course work than average students in high school, and both of those factors predict success in introductory CS courses. Think about the role of algebra in programming. There are high schools here in Atlanta where less than half the students pass algebra. The same CS curriculum that assumes success with algebra is highly unlikely to work well for both undergraduate and high school audiences.

Imagine a highly successful undergraduate introductory computing curriculum in which 80% of the students succeed. That’s 80% of students from the top half of whatever scale we’re talking about. The same curriculum might fail for 60% of the general population.

We see a similar sampling error when we talk about using MOOC data to inform our understanding of learning. The edX website says that they offer a powerful platform for "exploring how students learn" (see page here). Students who take MOOCs are overwhelmingly well-educated, employed, and from developed countries (see papers here and here) — characteristics which describe only a small percentage of the overall population. We can’t assume that what we learn from the biased sample of MOOC participants describes the general population. Psychologists are already concerned that many of their findings are biased because they over-sample from "WEIRD" students:

They found that people from Western, educated, industrialized, rich and democratic (WEIRD) societies — who represent as much as 80 percent of study participants, but only 12 percent of the world’s population — are not only unrepresentative of humans as a species, but on many measures they’re outliers. (see APA piece here).

It’s easy to fall prey to the Inverse Lake Wobegon Effect. Those of us who work at colleges and universities only teach undergraduate and graduate students. It’s easy for us to believe that those students represent all students. If we’re really aiming at computing for everyone, we have to realize that we don’t see everyone on our campuses. We have to design explicitly for those new audiences.