Thumb Numbers

Rules of thumb are common and often very helpful. They convey bits of wisdom accumulated by many people over a long time. Parkinson’s Law warns that things seldom get done early: “Work expands to fill the time available for its completion.” Murphy’s Law warns against complacency: “Whatever can go wrong, will.” Hofstadter’s Law captures the project planner’s conundrum: “Projects always take twice as long as planned, even when this law is taken into account.” Even the self-contradictory adage, “The exception proves the rule,” is taken as rule of thumb.

Rules of thumb seem much more concrete, authoritative, and universal when they contain numbers. I have picked three examples to examine here. Despite their popularity, they are very shaky as general rules:

80-20 Rule: 80% of production comes from 20% of producers.
7 Chunks Rule: Our mental span of control is limited to about 7 chunks.
7% Contents Rule: The credibility of your message depends 93% on your tone of voice and body language, and only 7% on the content of your words.

Even as heuristics, these “rules” are not very reliable. We can only have confidence that a rule applies when we have data to ground our conclusions.

The 80-20 Rule

The 80-20 rule is a statement about a population of unequal producers. It says if you rank order the members from largest to smallest productivity, you will find that 80% of the production comes from the first 20% of the ranking. The rule is named after the Italian economist Vilfredo Pareto. In 1906 he observed that 80% of the land in Italy was owned by 20% of the population. He also observed in his garden that 80% of the peas came from 20% of the pods. The Pareto effect has been observed in many other cases and has led to statements of the following kinds in many fields:

About 80% of the world’s GDP is produced by 20% of the countries (United Nations Development Program, 1992).
About 80% of your profits (or complaints!) come from 20% of your customers.
About 80% of injuries come from 20% of known hazards.
About 80% of crimes are committed by 20% of criminals.
About 80% of the vulnerabilities in a critical infrastructure reside in 20% of the nodes.
About 80% of bug reports will be eliminated by fixing the top 20% of known bugs (Microsoft Security Development Lifecycle).
About 80% of Internet traffic goes to 20% of the nodes.
About 80% of your software specifications can be implemented with only 20% of the full-project effort (the rapid prototyper’s motto).

There are many more examples using different ratios. For example, “You will be most successful at weight loss if 90% of your foods are healthy and 10% are ‘fun’ foods.” And the ever popular, “About 99% of the wealth is held by 1% of the taxpayers.” Without much searching, you will easily find many more examples like these.

Some of these claims are validated by data, but many are simply asserted based on an assumption that the Pareto principle is universal. Is it?

To answer that question, let’s go back to basics. We are given a set of producers. Associated with each producer x is a(x), the amount that x has produced. The measure a(x) can also be the number of occurrences (frequency) of x, because x is “producing” occurrences. The producers have been ranked (labeled) in order of decreasing production. If A is the total amount from everyone, we can set the proportion of production from x as p(x) = a(x)/A. Statisticians define these distributions as “Pareto distributions” when p(x) decays proportional to x^-a, where a is a parameter.

In the special case a=1, the frequency (production) of x is proportional to 1/x and the distribution is called Zipf’s Law. Zipf’s Law is named after George Kingsley Zipf (19021950), who observed that in compilations of words from various languages sorted by decreasing frequency, the frequency of any word tends to be inversely proportional to its rank.

Another common case is a=2, in which case the distribution is often called a power law (or more precisely “inverse square power law”). In this case, doubling the rank cuts production to a quarter. An example is Internet connectivity, where the p(x) is the relative number of nodes that have x connections to other nodes.²

People’s memories are more powerful when they incorporate more meaning, relationships, and context.

The accompanying figure shows two data sets plotted on log-log scales. When the data on a log-log graph fall on a straight line of slope –a, they obey a power law with parameter a. The upper line plots the data of a Zipf Law (a=1) and the lower line a power law (with a=2). In other words, we can confirm how closely our data follow a Pareto distribution by plotting them on a log-log graph and seeing how closely they follow a straight line.

The 80-20 cutoff point is not a general feature of Pareto distributions. In the data for the figure, the Zipf-Law data displayed an 80-54 cutoff, and the power-law data displayed an 80-4 cutoff. Only when a=1.32 did the 80-20 rule work exactly. For continuous data, a=1.16 makes it work (Wikipedia).

So the “80-20 rule” is little more than a folk theorem based on a few prominent cases. It holds only for a small subset of Pareto distributions. It is not a universal law. Thus, for instance, network scientists who believe that disabling a handful of highly connected “hub” routers could shut down the Internet are mistaken because the real Internet is engineered for more redundancy.¹ Similarly, rapid prototypers who believe that 80% of the specifications can be completed with 20% of the total project effort are on shaky ground.

When good data are available, it is easy to calculate a cutoff point and put it to good use. For instance, an engineer might have discovered that 10% of the electric-grid nodes account for 90% of the vulnerabilities; hardening those nodes is a good use of limited infrastructure protection funds.

While we may have trouble predicting where inequality of production arises or what causes it, we can be sure that almost all the time production will be unequal. Enterprise software expert Rick Hayes-Roth told me of a practical way, inspired by Alan Lakein,³ to exploit inequality for software development. Make a list of all the outcomes your software is to produce. Rank each one as priority A (top), B (middle), and C (low) with the constraints that at least 1/3 be rated C and at most 1/3 be rated A. Then ignore B and C tasks and do only A tasks. Customer groups can produce these rankings by giving everyone enough votes to cover 1/3 of the list of possible outcomes. Hayes-Roth says that when time pressures increase (as when technology accelerates), the rewards of this approach increase. Getting the A priorities delivered on time will keep you alive, while chasing after B’s and C’s will sink you.

The 7 Chunks Rule

In 1956 psychologist George Miller published in Psychology Review a study called “The magical number seven, plus or minus two,” where he found that most people can remember between five and nine “chunks” of information in their short-term memories.

Miller’s paper became very popular. It led to popular notions such as a manager should manage about seven direct reports, and more than that become unmanageable. The idea that management span of control is ideally around seven has been taken as a law even though there is little data to support it. Some managers have shown great skill with many more than seven reports, and others have trouble with two or three.

Later studies of human memory have shown that people can learn to remember many more than seven items after being trained in memorization methods. They form hierarchies grouping chunks at successive levels, all linked together by stories and substories. The real lesson is not that short-term memory is a limitation, but that people’s memories are more powerful when they incorporate more meaning, relationships, and context.

The 7% Content Rule

In 1971, Albert Mehrabian of the UCLA business school published a book about factors that lead customers to like or dislike a salesperson.⁴ He concluded that the words of the sales message account for 7% of the “like” assessments, tone of voice for 38%, and body language for 55%. People are especially sensitive to incongruities, for example someone claiming to have no complaint about you while avoiding eye contact with you. In that case, listeners tend to go with their sense of voice and body language rather than the content of the words spoken.

Many people have seized on this study as proof that nonverbal communication is more important than verbal. Allen Weiner has written a book about how you can conduct yourself in the workplace by cultivating good practices of voice and body language.⁵ Although supposedly derived from this rule, most of Weiner’s excellent advice does not depend on the truth of a 7-38-55 rule.

There are many reasons to doubt the universality of this rule. Mehrabian wrote about listeners reacting to recordings of single words, rating the emotional content of the words, and seeing if the rated emotions agreed with facial photographs of speakers. Mehrabian himself emphasizes the studies were about communicators talking about their feelings or attitudes and that little can be inferred for other contexts.

Philip Yaffe attacks this rule.⁶ Many communications are in the form of speech or text; they are delivered by email or Web pages, not good media for communicating voice or body language. In delivered speech or a conversation, the persuasiveness is mostly a function of the words and their resonance with the concerns of the listener. Certainly, incongruous voice or body language can undermine your listener’s trust, but no amount of those factors will overcome the lack of content. Yaffe points out that Abraham Lincoln’s Gettysburg Address is one of the most famous speeches of all time, and yet no one has the slightest idea of Lincoln’s tone of voice or body language.

Conclusion

There are many simple, easy-to-remember numerical rules of thumb. Many are catchy and seem to accord with our experience. They become “sticky memes” that people pass around as conventional wisdom. Not suspecting these sticky stories are mostly anecdotal, many people draw unwarranted conclusions.

When we teach math and computing, we know it is a serious mistake to start with the simple mathematical law abstracted from generations of thought about many real cases. It is far better to teach meaningful examples, and then summarize them with a law.

As a rule of thumb, rules of thumb are most useful when they summarize well-understood real-world experience. In the hands of the inexperienced, they are easily misapplied.

Figures

Figure. Example log-log plots of two Pareto datasets.

The 80-20 Rule

The 7 Chunks Rule

The 7% Content Rule

Conclusion

Figures

Thumb Numbers

DOI

June 2013 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

The 80-20 Rule

The 7 Chunks Rule

The 7% Content Rule

Conclusion

Figures

Thumb Numbers

DOI

June 2013 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.