Opinion
Last Byte

Sampling the Goods

Clever sampling for the worst/average case.

Posted

The basic principle of sampling is to gain as much information as possible from each sample. If you know what you will find in a sample, then there is no point in sampling it. The more uncertainty, the more you learn—the basic principle of information theory.

Warm-up 1: Consider that you are faced with three containers, each containing 1,000 packages as in the figure. One container holds packages that are all damaged. One container holds packages that are all good. One container contains half good and half bad. If you want 1,000 good packages, describe an inspection strategy that guarantees you can identify 1,000 good packages and determine which container contains all good, which contains all damaged, and which is mixed. Note that inspection does not destroy a good package.

Figure. There are three cargo containers. Each container contains 1,000 packages. One contains packages that are all damaged. One contains packages that are all good. One contains 500 of each. You want 1,000 good packages. Inspecting a single package takes an hour, so you would like to inspect as few as possible. What strategy should you use and how long will you expect to inspect?

Solution to Warm-up 1: Call the containers A, B, and C. Take packages in turn from each container. If any package from a container is damaged, then ignore that container in the sequel. If you get to the point there is only one container left, that is the container you want and no more inspections are needed. This works because any container that has any damaged package is either mixed or fully damaged. The container without any damaged package must therefore be the good one. You will find the container all of whose packages are damaged in the first round robin for sure. If you never find the mixed one after choosing 500 from each of two containers, then one more examination will determine which container has all good packages and which container is mixed.

Warm-up 2: If you use the Warm-up 1 strategy, what is the smallest number of packages that need to be inspected for you to be sure to know which container contains only good packages?

Solution to Warm-up 2: Two packages is the minimum to inspect. If the packages from containers A and B are both bad, then surely C has only good packages.

Question 1: What is the maximum number of packages you might need to inspect using the Warm-up 1 strategy?

Inspection does not destroy a good package.

Solution to Question 1: The worst case for that strategy is the first round finds two good (undamaged) packages and one damaged one. Subsequent rounds on the remaining two containers keep finding good packages until the 500th package of the mixed container is selected. At that point, you have 1,000 good packages. So this requires 1,001 inspections.

Question 2: The Warm-Up 1 strategy requires many inspections in the worst case. Is there a strategy that guarantees you will not need more than 502 inspections to identify 1,000 good packages?

Solution to Question 2: Take one from each of two containers, say A and B. If you get two damaged packages you are done: container C has only good packages. But in the worst case, you will get one damaged package and one good one. Keep looking in the container, say B, having the good package. If you ever find a bad package, then, container C has only good packages. Otherwise, after examining the 501st package from B, you will know B has only good packages.

The trouble with the Question 2 strategy is approximately half the time, you will inspect at least 501 packages. It is very unlikely the Warm-Up strategy will require the inspection of even a fraction of that number.

Question 3: What is the likelihood the Warm-Up 1 strategy will require the inspection of more than 21 packages? An answer within a factor of two is fine.

Solution to Question 3: Appromimately 1 in 1,000. In the first round, we get one bad and two good packages. We discard the container having the bad one and then have just containers B and C. Now, let’s focus on the mixed container, say B. For B to survive round k, its package has to be good. The probability that this is so is 1/2 (slightly less because there have been only good packages prior to the kth round). Thus, for B to survive until round 10 (corresponding to the first round where three packages are inspected and 9 rounds when two packages are inspected for a total of 21 packages), B must survive 10 tests with a 1/2 chance of getting caught, which is 1/1,024. So, the Warm-Up 1 strategy requires fewer inspections than the Question 2 strategy most of the time.

What is a strategy to get 1,000 good packages while minimizing the average case number of inspections?

Upstart 1: Suppose the mixed container has a fraction f of good packages and 1-f of bad and f > 1-f. For some value of f, could the Warm-Up 1 strategy require fewer inspections than the Question 2 strategy in the worst case? In the average case?

Upstart 2: What if all containers have different fractions of good packages. We know the fractions, but we do not know which container has which fraction. Also, there are at least 1,000 good packages among all containers. What is a strategy to get 1,000 good packages while minimizing the average case number of inspections?

Upstart 3: Same as Upstart 2 but we want 1,000 packages of which at most a fraction b could be bad.

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.