In CS Education, Educate the Educators First

In their point/counterpoint "CS Education in the U.S.: Heading in the Wrong Direction?" (July 2009), Robert Dewar wrote that the CS curriculum lacks important fundamentals, mostly of a mathematical nature, and Owen Astrachan wrote that "Studying more mathematics will not make software bugs disappear, although both [Edsger W.] Dijkstra and Dewar seem to think so," concluding implicitly that more mathematics in the CS curriculum is not necessary.

Astrachan apparently overlooked the distinction between necessary and sufficient conditions, a routine distinction in mathematics. As far as I know, neither Dijkstra nor Dewar ever claimed that mathematics is a "silver bullet" guaranteeing reliable design, and both would likely agree with Fred Brooks that indeed there is no silver bullet.

Dijkstra’s and Dewar’s argument that mathematics is an essential aspect of a proper CS education mirrors the fact that education in all engineering disciplines involves a solid mathematical foundation from the start, then used in all (other) courses; see my article "Teaching and Practicing Computer Science at the University Level" in Inroads, SIGCSE Bulletin 41, 2 (June 2009), 2430.

As long as CS educators cannot agree about the fundamentals, practice will remain below the professional standards common in classical engineering; see Allen Tucker et al.’s "Our Curriculum Has Become Math-Phobic" in SIGCSE Bulletin 33, 1 (Mar. 2001), 243247. This could be the result of declining CS student enrollment, possibly leading to the replacement of mathematics with, say, trendy topics apparently more appealing to freshmen. However, even trendy topics can be combined with a solid mathematical foundation, with the trendy topics included as illustrations. In any case, the emphasis in teaching such topics must be mathematical modeling, not mere description.

Unfortunately, teachers are divided, so combining theoretical CS expertise and practical experience in the same person is rare, unlike in other engineering disciplines. Educating the educators may well be a first priority.

Raymond Boute, Ghent, Belgium

Modular Programming Still a Challenge

In Leah Hoffmann’s interview "Liskov on Liskov" (July 2009) Barbara Liskov recalled research being conducted at the time she was beginning her career (1970). "When I started," she said, "the main way people thought about modularization was in terms of subroutines… But they didn’t have any way of linking a bunch of procedures together." She also said (in Karen A. Frenkel’s news item "Liskov’s Creative Joy," also July 2009) "…the major challenge still is how to build large software systems."

The definition of "module" inevitably affects the building of large software systems, prompting our own recollections and connections. For example, a counterexample to the use of subroutines as the basic module existed in the form of a project that began in 1965 at Argonne National Laboratory (ANL) and was reported at the Spring Joint Computer Conference (May 1969) to develop the Argonne Reactor Computation System (ARC) from linked computational modules running on IBM System 360 computers and OS/360. The modules were FORTRAN IV main programs where the output of one program was analyzed to become the input for other programs. Reactor designers wanted three main features:

Superprograms. To link programs into superprograms directed by another FORTRAN IV main program while the original programs could still run on their own. They wanted all Assembler code and any direct communication with the manufacturer’s software to be isolated for porting to other hardware. Initially, the most important Assembler code was for LINKing and LOADing modules for coordinating the I/O stream and communicating with JCL;

New algorithms. To improve the modules through new-algorithm rewrites wherever possible; and

New modules. To make it possible to build modules as required.

ARC programmers each had five to 10 years of experience (rare in 1965) and advanced degrees in mathematics, science, and engineering. They coded in Absolute and Assembler on ANL-built hardware (AVIDAC and GEORGE) and in FORTRAN on the IBM 704 and the CDC 3600. Because OS/360 was so unstable (as shipped in 1965), they needed to be able to decipher dumps. Completion of the project would have been delayed without programmers capable of working close to the hardware and operating system.

The ARC system later became the platform for ANL reactor calculations and was studied and used by other laboratories worldwide, though not before it was shown to be portable.

On the first Earth Day (Apr. 22,1970), a joint project with the Control Data Corporation aimed to port the ARC system to CDC hardware. We were aided in this effort by Richard Lee of CDC and Larry Atkins, a Northwestern University engineering student also known for writing Chess 3.6, the winner of several ACM North America Computer Chess Championships in the 1960s and 1970s. Once the modular environment was ported, the computational modules were easily ported. In fact, after one computational module was ported, the remaining work was almost automatic.

Louis C. Just, Lakewood, CO
Gary K. Leaf, Argonne, IL
Bert J. Toppel, Argonne, IL

A Quart of NFE Solution for a Pint CPU Problem

In his article "Network Front-End Processors, Yet Again" (June 2009), Mike O’Dell seemed to be arguing what I say to colleagues in simpler terms, namely, that using a network front-end (NFE) processor is (and always has been) a bad idea for one basic reason: It must perform moderately complex computation at least as fast as a computer’s "main" CPU, yet manufacturers insist it be cheap due to the cost "multiplier" effect when a computer includes many network links. To show that solving this problem is impossible, consider the following proof by contradiction:

Assume an NFE processor with requisite attributes—moderately general-purpose, fast, and cheap—with most computer manufacturers using it as a main processor, not as a lowly NFE and so in need of even more network bandwidth.

Many computer engineers have long understood, as O’Dell wrote, that the most efficient way to implement a network-protocol software stack is to use one or more CPUs of an N-way SMP, but users strongly resist this idea when they discover they’ve paid big for what, from an application point of view, is only an N-1-way computer; note, too, popular "low-end" cases in which N= 2 or 4. Apparently, Sun Microsystems ("the network is the computer") hasn’t left much of an impression on IT managers. The result is that NFE startups continue to waste engineering talent by trying to pour a quart of technology into this particular pint jar.

Scott Marovich, Palo Alto, CA

How to Address Big Data

I want to thank Adam Jacobs for cataloging the important issues in the management of large volumes of data in his article "The Pathologies of Big Data" (Aug. 2009). But please know that innovations are also being made by relational database vendors. One recently released product is the HP Oracle Database Machine (http://www.oracle.com/database/database-machine.html) that processes large volumes of data using massive horizontal parallelism across a shared-nothing storage grid. Pipeline (vertical) parallelism is enabled by offloading data processing to a storage grid, so the amount of data that must be shipped back from the storage grid to the database grid is reduced as well. The storage grid and database grid are connected through a high-speed Infini-band interconnect.

A single DM has an I/O bandwidth of 14GB/sec in the first version of the product (with uncompressed data). If the data is compressed, the effective bandwidth is much greater, depending on compression ratio. (Jacobs’s experiment would have run much faster on the DM.) Multiple DMs can be connected to increase data capacity, along with corresponding network/computing capacity.

Oracle’s Automatic Storage Management (ASM) is an integral part of DM, providing automatic load balancing across all nodes in the storage grid. ASM also provides fault-tolerance through dual or triple mirroring. The Oracle database provides high-availability and disaster-recovery features, and ASM enables sequential I/Os through its allocation strategies (such as large allocations).

One of the best ways to improve query performance (assuming the most optimal access method is used) is to avoid I/O altogether. The Oracle database provides rich partitioning strategies that enable skipping large chunks of data that do not qualify as a query scan.

Moving data from production (OLTP systems) to a specialized data store adds to a system’s total cost of ownership, as one would otherwise be managing two different data stores with poor or no integration. DM solves the big-data problem without a special data store just for data analysis; DM provides a single view of the data.

Though I am a technical member of the Oracle Exadata development team, my aim here is not to plug the product but report that the big-data problem is indeed being tackled, particularly by relational database vendors.

Umesh Panchaksharaiah, Richmond, CA

My Generation

Samuel Greengard’s news story "Are We Losing Our Ability to Think Critically" (July 2009) is inspiring as a basis for future work. I am coordinating an interdisciplinary seminar on the collective construction of knowledge (http://seminario.edusol.info in Spanish), including two topics Greengard might be able to bridge: One is free software in a democratic society, inciting people to be more politically active and involved, despite being (usually) independent of political parties and other traditional means of shaping society. The other is how motivation and peer-recognition function in these communities; such free-culture communities have much in common with scientific communities, despite starting off with completely different motivations.

My generation (born in the 1970s), including many people in the free software movement, has directly experienced the great shift computing and networking have brought the world, fully embracing the technologies. The greatest difference between people who are just users of computing and those striving to make it better depends on who has the opportunity to appropriate it beyond, say, the distraction level, the blind Google syndrome, or the simple digestion of "piles of data and information [that] do not equate to greater knowledge and better decision making."

Thanks to Greengard for sparking some useful thoughts.

Gunnar Wolf, Mexico City

Footnotes

Communications welcomes your opinion. To submit a Letter to the Editor, please limit your comments to 500 words or less and send to letters@cacm.acm.org.

DOI: http://doi.acm.org/10.1145/1610252.1610255

In CS Education, Educate the Educators First

Modular Programming Still a Challenge

A Quart of NFE Solution for a Pint CPU Problem

How to Address Big Data

My Generation

In CS Education, Educate the Educators First

DOI

December 2009 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Modular Programming Still a Challenge

A Quart of NFE Solution for a Pint CPU Problem

How to Address Big Data

My Generation

In CS Education, Educate the Educators First

DOI

December 2009 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.