Software Quality

“There are two common aspects of quality: One of them has to do with the consideration of the quality of a thing as an objective reality independent of the existence of man. The other has to do with what we think, feel, or sense as a result of the objective reality. In other words, there is a subjective side of quality.” —W.A. Shewart, 1931⁶

Quality and dependability of software systems are among the greatest concerns of computing professionals. In the 1970s Barry Boehm,² James McCall,⁵ and their colleagues devised models for measuring software quality, which were eventually folded into an international standard ISO 9216.^1,4 The standard metrics measure objective characteristics of the software. But, as Shewart noted long before software existed, quality is ultimately a subjective assessment made by humans reacting to artifacts. In other words, quality is in the eye of the beholder. Today much software falls short of the ISO standard and yet is more popular than systems that meet the standard. What standards of assessment are modern software users using? I will propose a preliminary answer here.

Traditional Code-Level Standards

Software developers have traditionally sought to produce software that is dependable, reliable, usable, safe, and secure. They wanted to find objective measures to support these goals, attributes that can be observed and quantified in the software itself. They developed software quality measures based on the notion that quality is strongly related to rigor in the specifications and texts that appear throughout the software design process. Program construction techniques that maintain tight relationships between program structure and specifications are considered to be essential. The basic documents for software quality assessment list 20 measurable factors to assess overall quality:^1,2,4,5

correctness
reliability
integrity
usability
efficiency
maintainability
testability
interoperability
flexibility
reusability
portability
clarity
modifiability
documentation
resilience
understandability
validity
functionality
generality
economy

Each of these factors can be expanded and elaborated in detail, resulting in a complex set of rules for programmers to follow. No one said that quality is simple and straightforward.

Today there are huge markets for software in the form of apps downloadable to a mobile device or desktop and in the form of cloud services. The Android and Apple apps stores offer approximately 1.5 million apps each. This is a different environment from the one in which the international standards were developed, when consumer software was relatively uncommon. Under pressure to beat competitors to market, software developers routinely deliver software with bugs. Moreover, many software systems are so big and complex there is no hope to deliver them without bugs. For example, among operating systems, the Linux kernel has 15 million lines of code, Windows 10 has 50 million, MacOS 10.4 has 86 million, and full open source Debian release of Linux has 420 million (all according to Wikipedia). These systems are riddled with flaws, which contribute to buggy application software and cyber exploits.

Emerging User-Level Standards

Let us consider how this looks from a user perspective. Users do not ask, “Is this software well structured by ISO 9126 standards?” but rather “Does this software help me get my work done? Can I depend on it?” What does it mean to the user to get work done or depend on software? There is a strong correlation between the user’s experience of satisfaction and the user’s assessment of quality. I see six distinct levels—four positive and two negative (see the accompanying table). I will discuss them from the bottom up.

Level -1: No trust. Users do not trust the software. It may be full of bugs, crash their systems, or carry malware. You might think users would avoid untrusted software. But users do often use untrusted software—for example, after being lured by fraudulent pitches, phishing, visits to compromised websites, overwhelming desires for convenience, and the like.

Level 0: Cynical satisfaction. Many users trust some but not all the claims made by the software maker—but they trust enough to be cynically willing to use it. Much software is released with bugs and security vulnerabilities, which the developers fix only after hearing user complaints and bug reports. User forums are rife with stories about how the software has failed them and with requests for workarounds and fixes; representatives of the developers are usually nowhere to be seen in these forums. A combination of factors facilitates this situation including strong pushes to get something workable to market before the competition, belief that users will tolerate many bugs, and a lack of liability codified in the license agreements users must sign to unlock software. This approach to software delivery is coming under fire because the many bugs are also security vulnerabilities. Cynical users have no loyalty and will desert to another producer who makes a better offer.

Level 1: Software fulfills all basic promises. The user assesses the producer has delivered exactly what was promised and agreed to. This might be called “basic integrity.” The ISO standard addresses this level well.

Level 2: Software fits environment. The user assesses the software is a good fit to the user’s environment. This means several things. The practices and routines to use the software align with other practices and routines already in the environment; for example, because an ATM implements familiar practices of making bank transactions, users can use an ATM immediately without having to learn anything special or new. The software does not enable or encourage actions that violate social or cultural norms in the environment. The user has the experience that the software improves the user’s ability to get work done and to carry out important tasks.

Level 3: Software produces no negative consequences. After a period of use, the user has encountered no unforeseen problems that cause disruption or losses. The user assesses that the product’s design has been well thought out and that it anticipates problems that were not apparent at the outset.

Negative consequences can arise in numerous ways: The software carries vulnerabilities that can be exploited by hackers and malware. The software itself contains malware that can steal, damage, or destroy user data. The user attempts an action that was not intended or considered by the designers and the software misbehaves and damages the environment or data. A user makes a mistake with the software and there is no provision to back out to a previous good state. Over time, users develop new expectations that cannot be met by the current capabilities of the software. There can be unforeseen interactions between the many copies of the same software distributed throughout a network—for example, the stock market crash of 1987 occurred when a large number of computers programmed to sell when prices dropped by more than a preset amount automatically issued sell orders, driving prices down and triggering more selling by other computers. Operating system security vulnerabilities are another example: any or all of millions of systems can be attacked via a single vulnerability.

From long experience, a good software designer may include functions the user did not ask for but that will spare the user unwanted future problems. An example is the Apple Time Machine continuous backup system; the user can retrieve any previous version of a file and can transfer an entire file system to a new computer quickly. Another example is the Microsoft Secure Development Lifecycle, a set of internal management controls that significantly reduced bugs and security vulnerabilities in Microsoft systems.³ The designer will continue to work with the customer after the software is installed in order to modify the software in case negative consequences are discovered. These actions—anticipation and continued availability after delivery—are essential for a software producer to earn the user’s satisfaction at this level.

Level 4: Software delights. At this level the product goes well beyond the user’s expectations and produces new, unexpected, sometimes surprising positive effects. The user expresses great delight with the product and often promotes it among others. The user feels the producer understands the user’s world and contributes to the user’s well being.

Very few software systems have produced genuine delight. Some early examples include the Unix system, which was elegant and enabled powerful operations with simple commands; the Apple Macintosh, which brought an incredibly easy to use desktop with a bitmapped display; the DEC VAX VMS, which was amazingly stable and retained previous versions of files for fast recovery; VisiCalc, the first automated spreadsheet, which made easy accounting available to anyone; Lotus 1-2-3, a successor of VisiCalc, which enabled arbitrary formulas in cells and opened a new programming paradigm; and Microsoft Word, which made professional document formatting easy and eventually effectively banished most other word processors from the market.

Recent examples include the iPhone and Android operating systems, which allow customizable access to millions (literally) of downloadable apps. Among the apps themselves some have attained high delight ratings; for example, many airlines, publishers, and newspapers offer apps that give direct access to their content via a mobile device. Some apps give users access to networks where data from many others is aggregated to give the user something that saves a lot of time and anxiety. For example, Amazon created the Kindle reader service that enables users to purchase e-books from the Amazon store and begin reading them instantly from any device with a Kindle app. Google and Apple maps use location information from smartphones to detect traffic congestion, overlay it on street maps, and propose alternate routes around congested areas. Blizzard Entertainment accumulated as many as 10 million subscribers to its World of Warcraft online game because of its rich complexity and realistic graphics. Uber allows users to hail rides whose drivers come to their exact location within minutes. In each case, customers found they could do previously impossible things with the app than without, much more than they expected.

The interesting thing about these examples is that many failed important ISO metrics such as portability, speed, efficiency, or reliability. Yet people ignored those shortcomings and became avid and loyal subscribers to the software developer.

Software developers are banking on new delights as artificial intelligence technology matures. Many people are looking forward to driverless cars, personal assistants that know your daily routines and keep you from becoming forgetful, and virtual reality tools that allow you to tour distant places, train risk-free for a new skill or environment, or access new kinds of entertainment.

But delight is ephemeral if based on the software itself: Having mastered the new environment, the user will expand horizons and expect more. Few would find the original Unix, Macintosh, VMS, VisiCalc, or Word to be delightful today. Software producers now invest considerable effort into anticipating what will delight their users in the future. Their question has to be: Will we be able to provide delightful surprises for customers with growing expectations?

Conclusion

I have argued that software quality evaluation has transformed significantly from code-level measures of the 1970s to user-level assessments today. I proposed six levels at which users assess software quality. The levels reflect different degrees of emphasis on user satisfaction. Program correctness is essential but is limited to quality at the first level. The highest level—delight—arises in the context of the relationship between the customer and performer. The delighted customer will say that the performer has taken the trouble to understand the customer’s work and business, is available to help with problems and to seize opportunities, may share some risks on new ventures, and generally cares for the customer. Software producers today look to designs and services that produce genuine delight. When they succeed we witness new waves of killer apps.

Tables

Table. Six User Levels of Software Quality Assessments

Traditional Code-Level Standards

Emerging User-Level Standards

Conclusion

Tables

Software Quality

DOI

September 2016 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Traditional Code-Level Standards

Emerging User-Level Standards

Conclusion

Tables

Software Quality

DOI

September 2016 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.