Opinion
Artificial Intelligence and Machine Learning Forum

Embrace the Engineering Metaphor

Posted
  1. Introduction
  2. Credit for Crypto's Parallel Development
  3. Enough PDF; Give Me HTML
  4. Author

As Wei-Lung Wang points out in his "Viewpoint" ("Beware the Engineering Metaphor," May 2002), there are many similarities between the work of the software developer and that of the mathematician. However, I question the central thesis of the column, namely that "building software is inherently not an engineering task." The evidence provided by Wang is that conventional engineering disciplines, including electrical and mechanical, are based on immutable laws of nature and that software lacks a similiar fixed framework. But a closer look reveals software does have a multitude of fixed frameworks—known as domains—by which it is constrained.

Domains are specific areas of knowledge, and their importance to software development belatedly was recognized. Domains range from the lowest-level "implementation" domains such as Java and x86 assembly, up to application-level domains, such as accounting, airline scheduling, payroll, and employment. In fact, all software can be viewed as an implementation of some subset of one or more domains.

Every domain has its constraints, or "laws." For example, in the domain of linked lists, in order to access any element, it is necessary to traverse through all the preceding elements; the final one has nothing following it. When these domain constraints are not observed, the program fails. This may show up as an exception, illegal memory access, or garbage data, while leading some people to believe software is merely a "collection of instructions."

While the laws of a domain, such as double-entry bookkeeping, may not be immutable laws of nature, ignoring them nevertheless produces painful financial consequences. Similarly, ignoring the rules of a domain, like application security, leads to server shutdowns and loss of confidential information. Even softer domains, such as human-computer interaction, have laws that, when ignored, lead to unusable user interfaces.

Regarding theoretical formalization, many of the higher-level domains have resisted an effective and usable formalization. But this does not mean these domains lack well-defined rules and constraints. The challenge to the domain engineering community is to roll out formalizations the software community can use. These formal domain theories allow exactly the kind of manipulation available to other engineering disciplines.

Finally, the reason why attempts to build reusable components and libraries often fail has everything to do with the misperception that components exist in isolation. This is another place where domains can be expected to be useful. In fact, they are considered by some the key to effective reuse.

Srinivas Nedunuri
Austin, TX

Back to Top

Credit for Crypto’s Parallel Development

In his article on the social and cultural influences that went into the development of public-key cryptography ("Enabling Crypto: How Radical Innovations Occur," Apr. 2002), Arnd Weber completely ignores the fact that the technology was invented twice, independently, and by groups in radically different spheres who knew almost nothing of each other’s work. He describes the background to Whitfield Diffie’s, Martin Hellman’s, and Ralph Merkle’s contributions in the mid-1970s, and concludes that academic leftist and libertarian traditions in the U.S. were instrumental in establishing the mindset that led to public-key cryptography.

There is a parallel story, described in The Code Book (S. Singh, Fourth Estate, London, 1999), that, had Weber chosen to acknowledge it, would have allowed him to produce a comparative and more penetrating study.

In April 1969, the British Ministry of Defense asked James Ellis to look into the key distribution problem. Ellis worked at the Government Communications Headquarters (GCHQ) in Cheltenham, then—as now—one of the most secret places in the U.K. GCHQ was formed after WWII from the remains of the Turing group at Bletchley Park that cracked the Enigma code.

Ellis proved public-key cryptography (or "nonsecret encryption," as he called it) was possible, and he developed the concept of public and private keys. He also knew he had to find a trapdoor one-way function to implement his idea. But he was a physicist, not a mathematician, so couldn’t solve this problem. He revealed what he had to his bosses at GCHQ.

The importance of Ellis’s invention was recognized. Then, in 1973, Clifford Cocks, a recent Cambridge graduate, realized that prime numbers and factorization were one way to make a trapdoor, and he had the solution—essentially what Rivest, Shamir, and Adleman were to come up with four years later.

Ellis was a gentle polymath of outstanding intellect, but he worked in secret, talking only to a few colleagues. He died in November 1997, age 73, just one month before GCHQ went public about his and his associates’ achievement. By that time the rest of us were using open-source, 1024-bit PGP to encrypt everything from exam papers to love letters.

The GCHQ team arrived there first, but if not for Diffie, Hellman, and Merkle, public-key cryptography would probably still be classified and used solely by NATO’s armed forces. The academic mindset and what Weber calls "enabling structures consist[ing] of the libertarian and leftist traditions in the U.S., social movements, a tradition of self-confidence, and an innovative industry" were not necessary for its creation; it was first invented by a secret government team in response to a military request. Those in that leftist libertarian tradition then re-created it independently, second, and in public.

Adrian Bowyer
Bath, England, U.K.

Weber Responds:
Let me illustrate why I chose not to mention Ellis’ work.

1) Secret developments are of less interest to Communications readers. Because of the secrecy of the possible GCHQ work, it had no practical relevance I am aware of. I am certain it has had no significance for encrypting or authenticating any communication by private persons or businesses to date.

2) I find it difficult to give the impression that achievements claimed by secret services staff about six years earlier than Diffie’s and Merkle’s work actually took place. The only evidence I was able to identify are electronic documents on a Web site. This is why I did not want to write, as Bower does, that it is a "fact" or that Ellis "proved" at that time that public-key cryptography was possible. According to Singh, these things may well have taken place, but I did not want to convey the impression I have scientifically usable evidence. For similar reasons I omitted NSA’s claim to have known public-key cryptography even a decade earlier than Diffie (Bobby Inman, according to Gus Simmons in the Encyclopaedia Britannica, 1986). Also, Singh mentions the development deserves just a "footnote," according to Hellman.

3) As I indicated in the article, an invention, such as public-key cryptography, can be accepted with less social and political motivation, as by Merkle. In the interview with Diffie (on my Web site) and in the version I submitted to Communications, I mentioned Peter Deutsch had introduced the principle. But this is not how the invention became known and turned into running systems.

4) Finally, Communications limits the number of references in an article. Hence, I reduced the list from 45 to 11. Unfortunately, this caused Ellis, Inman, Simmons, and a related article by Hayward to disappear from the printed version.

Back to Top

Enough PDF; Give Me HTML

I read Bertrand Meyer’s letter ("Forum," May 2002) with dismay. Has Meyer been persuaded by commercial software vendors to believe documents in PDF format are "universally understood by browsers"?

While I, too, applaud authors who make their work available on the Web in its native HTML format, I also welcome the availability of documents that can readily be printed in a form that most accurately reproduces the wishes of the publisher, and can be used by the greatest possible number of users. Today only Postscript meets that criteria; it is still the most universal document distribution format where printing is the ultimate goal.

Although documents in PDF format are theoretically widely usable, this is not the reality many of us find today. Indeed, no browser I know directly "understands" PDFs. Only a tiny minority of browsers are able to use external programs to pretend they directly display PDF content. Although Adobe’s PDF reader is freely available, it is not universally available across all computing platforms, and is not available in source form. It’s also a bloated program requiring rather substantial computing resources to run. Finally, many of us lack the facilities needed to run commercial software, such as Adobe Distiller.

On the other hand, almost all modern Web browsers inherently understand gzipped content; it’s an optional feature of the HTTP protocol itself. Indeed, at least one of the most popular semicommercial browsers will automatically uncompress any gzipped content as it retrieves it. There are also dozens of implementations of programs that can decompress gzipped content. I cannot begin to understand Meyer’s complaint that gzipped content is even more difficult or impossible to access, even from within a browser. Meanwhile, gzipped content reduces the bandwidth requirements for transmiting a copy of a document, and thus such compression can save us all bandwidth costs, as well as help to reduce Internet congestion overall.

I can usually view PDF-formatted content on my screen, though not nearly as conveniently as I view HTML or plain text. Postscript-formatted documents inevitably display more reliably and efficiently than PDFs.

The bigger problem is I cannot always print PDF-formatted content. However, I can almost always print original Postscript-formatted documents. The transformation Meyer would have authors perform is not always reversible (and is certainly not easily reversible). Should they perform that transformation and make the original Postscript document unavailable to people like me, they would sometimes make things impossible if not for Meyer’s suggestion.

Even if PDF-viewing and conversion software were universally available and reliable, it still doesn’t solve the printing issue for those of us interested in producing high-quality print output. PDF files are often rendered at low resolutions, lower even than the resolution of the screen displays some of us use for our daily work. PDFs rendered for even the common 600dpi resolution available with many home and personal printers would be many times larger than original Postscript files. Adobe says much the same thing in its white paper comparing Postscript and PDF.

I implore authors who publish their works on the Internet to ignore most of Meyer’s words and at least provide Postscript-formatted content (preferably in its compressed form to save us bandwidth costs). If possible, I would also like authors to publish online versions of their works in standard universal HTML as well. However, I am well aware of the limits of publishing in HTML, especially when the primary goal is to produce a printable document, and I am quite willing to forgive authors who cannot easily publish their works in decent HTML form.

Indeed, in the modern world where I encounter too much unreadable PDF content, the freely available Google search service often comes to my rescue by providing its own best-effort HTML translation. Were it not for Google, much of the Web’s content would not only be unsearchable, but often completely inaccessible.

Greg A. Woods
North York, ON, Canada

I agree entirely that the .ps.gz format has been made obsolete by the more universally readable PDF format (please excuse the redundancy). Though in defense of those "disrespectful" authors still using .ps.gz, I might point out that like Meyer’s difficulties in being able to read that format on his platforms, the authors who use .ps.gz may not be able to run Adobe Distiller on their platforms.

However, Meyer asserts that the original need for compressed formats—saving space—is now obsolete due to the availability of cheap storage. While this may be true, the driving need today for compression is bandwidth. Far too much bandwidth and money is being wasted on the transmission of uncompressed data. While image and video formats are compressed, a substantial portion of Internet bandwidth is spent on HTML and email, which are almost never compressed. Compression not only saves bandwidth, but, perhaps counterintuitively, compression also saves CPU resources on the machine doing the compression. The reason is the data is repackaged a few times before going out through the wire, requiring the data be copied from one part of memory to another each time. If the data is compressed, the time saved making those copies can pay for the time spent compressing with, for example, gzip or zlib. With the proper settings, gzip or zlib can compress quite effectively with relatively few CPU cycles. Other more modern methods may be able to approach those speeds with even more effective compression.

Interestingly, the PDF format is already compressed, at least since version 1.3. It uses the same deflate format used by gzip and zlib to compress Postscript and other content. So, in fact, it is not necessary for Meyer to invoke cheap storage as a defense for not using the .ps.gz format. The PDF format replaces the functionality of both .ps and .ps.gz by providing both the typesetting and the compression.

Meyer also suggests the use of HTML for the distribution of papers. This format can be compressed as well, and the commonly used browsers used to view HTML all support gzip decompression of HTML, as well as any other HTTP content, including style sheets, when specified in the content type header from the server, as specified in the HTTP 1.1 standard (RFC 2616). An Apache server using mod_gzip or mod_deflate can serve gzipped pages even when dynamically generated, though this is not used as often as it should be.

So I disagree with Meyer that compression is no longer important, but I wholeheartedly support his suggestion of a more purposed transition to PDF and HTML as standards for article distribution on the Web, because they already include or support compression.

Mark Adler
Pasadena, CA

Back to Top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More