What Do Scientists and Engineers Need to Know About Computer Science?

Georgia Institute of Technology Professor Mark Guzdial

A new effort at the Texas Advanced Computing Center is aimed at teaching scientists and engineers about supercomputing. They argue that "Anyone looking to do relevant computational work today in the sciences and engineering must have these skills." They offer a certificate or portfolio in "Scientific Computation."

Greg Wilson has been going after this same goal using a different strategy. He suggests that before we can teach scientists and engineers about high-performance computing, we first have to teach them about computing. He leads an effort called "Software Carpentry" to figure out what to teach scientists and engineers about computing:

I’ve been trying to construct a better answer for the past thirteen years; Software Carpentry (
http://software-carpentry.org/blog/) is what I’ve arrived at. It’s the 10% of software engineering (with a very small ‘e’) that scientists and engineers need to know *before* they tackle GPUs, clusters, and other fashionable Everests. Like sanitation and vaccination, the basic skills it teaches are cheap and effective; unfortunately, the other characteristic they share is that they’re not really what photo ops are made of. We’ve also found a lot of resistance based on survivor bias: all too often, senior scientists who *have* managed to get something to work on a supercomputer say, “Well, I didn’t need version control or unit testing or any of that guff, so why should my students waste their time on it?” Most scientists (rightly) regard computing as a tax they have to pay in order to get results.

The evidence is that the problem of teaching everyone else about computer science is bigger than teaching computer science majors about computer science. Chris Scaffidi, Mary Shaw, and Brad Myers have estimated that, by 2012, there will be about 3 million professional software developers in the United States, but there will also be about 13 million end-user programmers — people who program as part of their work, but who do not primarily develop software. That result suggests that for every student in your computer science classes, there are four more students who could use some help in learning computer science. Those scientists and engineers who will one day be programming are in those other four.

Brian Dorn and I have a paper in this year’s ACM International Computing Education Research workshop (in two weeks at Aarhus University) on his work studying graphics designers who program. Brian finds that these end-user programmers don’t know a lot about computer science, and that lack of knowledge hurts them. He find that they mostly learn to program through Google. In his most recent work, he is finding that not knowing much about computer science means that they’re inefficient at searching. When they see "try-catch" in a piece of code that they’re trying to understand, they don’t know to look up "exception handling," and they can easily spend hours reading about Java exception handling when they are actually working in JavaScript.

Maybe we should be teaching scientists and engineers about computer science more generally. But as Greg Wilson points out, they don’t want much — they see computer science as a "tax." What’s the core of computer science that even scientists and engineers ought to know? Alan Kay recently suggested a "Triple Whammy" defining the core of computer science:

Matter can be made to remember, discriminate, decide, and do.
Matter can remember descriptions and interpret and act on them.
Matter can hold and interpret and act on descriptions that describe anything that matter can do.

That’s a pretty powerful set. That goes way beyond Python vs. Java, or using Perl to check genome sequences with regular expressions vs. using MATLAB for analyzing data from ecological simulations. How do we frame the "Triple Whammy" in a way that fledgling scientists and engineers would find valuable and learnable?