Why Don’t Languages Support Multimedia All the Way Down?

Georgia Institute of Technology professor Mark Guzdial

Donald Knuth gave the keynote talk at ITICSE 2003 on "Bottom-up Education." He argued that the hallmark of thinking like a computer scientist was being able to shift levels of abstraction, from the highest levels of application, all the way down to the bits, if necessary. He was arguing for his MMIX processor, but the same argument can be made in lots of different pedagogical contexts.

That's really what Barbara Ericson and I are doing in our Media Computation approach to introductory computing. Students today use digital media every day. They recognize manipulation of media as a relevant and useful activity. In our approach, we teach them to manipulate digital sounds at the sample level and digital pictures at the pixel level. They can then write simple loops to create Photoshop-like effects (like flipping an image or removing red-eye) or to create digital sound effects (like creating an echo, splicing, or reversing sounds). Manipulating pixels and samples is fun and easy — we’ve shown that it’s a CS1-level activity. It's another case of manipulating the lowest-levels of abstraction to create an effect at the application level.

The problem is finding languages and libraries that support this level of access and manipulation. Sure, lots of languages can show pictures and play sounds — but that's getting stuck at Knuth's highest level of abstraction. How many languages and libraries, even those aimed at students, let you shift levels of abstraction with media?

Barbara and I wrote our books in Python and Java by cheating. Java does support shifting levels of abstraction. We chose a version of Python, Jython, that lets us reuse the classes that we wrote in Java. I've also been able to construct our Media Computation examples in Squeak as well. Jennifer Burg has shown how easily it can be done in C. Then that's really about it.

Our publisher has encouraged us to look into using Media Computation with other languages, especially Python 3.0. And that's where we run into problems. We can manipulate pixels — in fact, Nick Parlante at Stanford has started teaching JavaScript using Media Computation with pixel level manipulations. A recent review of audio packages for Python shows that none of them support sample-level manipulations cross-platform. I've been able to write small examples in PyGame, but there are some significant bugs in that package. For example, if you open up a sound that is not CD-quality, PyGame "re-samples" the sound, so a sound that you open and save back out might double in size. If you care about the byte level, it's disconcerting for more of them to appear without warning.

I have found no packages that let me do pixel and sample level manipulations in other languages. There is a book on learning Haskell with multimedia, but it's all at the highest level of abstraction. I've tried to find such supports for Scheme, but the only audio package I've found allows you to play sounds — but you still can't access the samples in those sounds. It's frustrating because, if a language or library supports playing the sounds, then those samples are somewhere there in memory. Let us at them!

Now, I'll bet that there are libraries out there for manipulating pixels of images and samples of sounds in many of these languages, but my experience suggests that they're not obvious, not easy to find. Why not? Don't we think that Donald Knuth is right, and it really is important for CS students to be able to get all the way down easily and obviously, to understand how to build it all back up?

There is an argument that real application developers don't typically work at that level. Video game programmers leave the pixel and sample manipulations to the gaming engine. Most application developers just want to show pictures, and play sounds and videos. But that doesn't excuse not providing access for students. Learning is a conscious process. It's so much easier to be concious about things we can see. How do you study something that you can't see, that you can't manipulate? How do you learn samples and pixels if they're always hidden inside some library or engine? Sure, it's possible to learn things that are invisible, but it works much better if they are visible, accessible, and manipulable.

Media is something that I care about, but I wonder if it's an instance of a larger problem. It's important for students to shift levels of abstraction. How well do our languages for students support shifting levels of abstraction, that is, being able to see everything, from the application level down to the bytes? And if they don't, we should be asking "Why not?"