BLOG@CACM
# The Inverse Lake Wobegon Effect

Every episode of the long-running radio variety show *A Prairie Home Companion* includes a segment where host Garrison Keillor tells stories from his mythical hometown, Lake Wobegon. Each of these segments ends with the sentence, "Well, that's the news from Lake Wobegon, where all the women are strong, all the men are good looking, and all the children are above average." That notion, that "all the children are above average," is an example of what's now known as the *Lake Wobegon Effect* (see description here), also known as "illusory superiority."

The Lake Wobegon Effect is where we consider the small sample in our experience as being superior to the population overall. A concrete example is that 80% of drivers consider themselves to be above-average drivers. Obviously that can't be true, but it's common that we think what we experience is above average.

The *Inverse Lake Wobegon Effect* is a term I'm coining for a fallacy that I see sometimes in computer science education. The Inverse Lake Wobegon Effect is when we sample from a clearly biased source and assume the sample describes the overall population. We know we're observing a superior sample, but act like we're getting a randomly distributed sample. This is a form of sampling bias.

I introduce the term in a new book that I just published with Morgan & Claypool, *Learner-Centered Design of Computing Education: Research on Computing for Everyone* (see blog post on book). One example of the Inverse Lake Wobegon Effect in CS Ed is assuming that a successful undergraduate introductory curriculum will be similarly successful in high school. Students in undergraduate education are elite. In the United States, undergraduates are screened in an application process and are certainly in the top-half of most scales (e.g., intellectual achievement and wealth). Elite students can learn under conditions that more average students might not succeed at, what educators call *aptitude-treatment interactions* (see description here).

Consider Bennedsen and Caspersen's work on predictors for success in introductory computing (see ACM DL link). Students in undergraduate education have better math grades and have more course work than average students in high school, and both of those factors predict success in introductory CS courses. Think about the role of algebra in programming. There are high schools here in Atlanta where less than half the students pass algebra. The same CS curriculum that assumes success with algebra is highly unlikely to work well for both undergraduate and high school audiences.

Imagine a highly successful undergraduate introductory computing curriculum in which 80% of the students succeed. That's 80% of students from the top half of whatever scale we're talking about. The same curriculum might fail for 60% of the general population.

We see a similar sampling error when we talk about using MOOC data to inform our understanding of learning. The edX website says that they offer a powerful platform for "exploring how students learn" (see page here). Students who take MOOCs are overwhelmingly well-educated, employed, and from developed countries (see papers here and here) -- characteristics which describe only a small percentage of the overall population. We can't assume that what we learn from the biased sample of MOOC participants describes the general population. Psychologists are already concerned that many of their findings are biased because they over-sample from "WEIRD" students:

They found that people from Western, educated, industrialized, rich and democratic (WEIRD) societies — who represent as much as 80 percent of study participants, but only 12 percent of the world’s population — are not only unrepresentative of humans as a species, but on many measures they’re outliers. (see APA piece here).

It's easy to fall prey to the Inverse Lake Wobegon Effect. Those of us who work at colleges and universities only teach undergraduate and graduate students. It's easy for us to believe that those students represent all students. If we're really aiming at computing for *everyone*, we have to realize that we don't see *everyone* on our campuses. We have to design explicitly for those new audiences.

Well-argued point, which I hope educators will take into account. Only an unreconstructed quibbler would point out that it is the statistical distribution that makes it impossible for 80% of drivers to be above average. For sure, though, 80% cannot be above the median.

I recently read an article in our local paper http://bit.ly/1qjaMSW about a high-school student who designed a CS curriculum for the AP CS exam; he's got a course on udemy https://www.udemy.com/decoding-ap-computer-science-a/ for high school students... I've no idea whether his high school environment is also subject to the inverse Lake Wobegon effect, but I'm guessing the course is much more accessible to the average high-school student than a college-level course would be.

High school CS courses are also subject to the Lake Wobegon effect. Consider where AP CS is taught, and where it's not. I wrote a blog post showing a map of Georgia (created by Tom McKlin) with darker colors indicating wealth in the country and yellow dots indicating where AP CS is taught (see https://computinged.wordpress.com/2014/10/10/where-ap-cs-is-taught-in-georgia-and-where-there-is-no-cs-at-all/). The yellow dots cluster in the rich counties. The south of Georgia (poorer than the north) has no AP CS at all. It's not reasonable to expect that an approach that works with rich students (e.g., with ubiquitous access to computing devices, with high-speed Internet in their homes) will work identically in the poor districts.

I see similar bias when high schools say that they're going to use the Beauty and Joy of Computing curriculum (http://bjc.berkeley.edu/) because it's been so successful and so diverse at UC Berkeley (http://www.slate.com/blogs/future_tense/2014/02/21/a_berkeley_intro_computer_science_course_has_more_women_than_men_for_the.html). And it has been super successful there! Berkeley undergraduates are elite, having been admitted to a highly-successful institution. Most high school students are not like Berkeley undergraduates. BJC is being adapted for high school settings, and comparing to those makes far more sense. Putting something into high schools because it works in undergrad institutions is absolutely a Lake Wobegon Effect bias.

Displaying **all 3** comments