Email users all over the world are being swamped by unsolicited commercial email (“spam”). Even the world’s richest person has not been spared:
“Like almost everyone who uses email, I receive a ton of spam every day. …But spam is worse than irritating. It is a drain on business productivity, an increasingly costly waste of time and resources that clogs corporate networks and distracts workers.” —Bill Gates [2]
Policymakers, Internet service providers, software vendors, and scholars are struggling to devise technological, regulatory, and social solutions [6]. However, a major obstacle for policymakers is that scientific research into the spam industry has been very limited. Even the most basic question—whether spam is sent randomly or targeted—remains open. U.S. Federal Trade Commission Chairman Timothy Muris [7] has asserted:
“Unlike phone calls or mail solicitations, sending additional spam is essentially costless. …Because email technology allows spammers to shift the costs almost entirely to third parties, there is no incentive for the spammers to reduce the volume. …At our Spam Forum, a bulk emailer testified that he could profit even if his response rate was less than 0.0001%.”
If spam is essentially costless to send, spammers should broadcast solicitations repeatedly to all available email addresses. As Bill Gates remarked:
“Knowing that only a small percentage of their output will get past today’s filters, spammers have responded by significantly cranking up the volume of emails they send” [3].
We conducted a field experiment to learn more about spam. Our first objective was to confirm whether spam is randomly distributed or targeted. If spam is not randomly broadcast, what factors determine the rate of spam? It is already well known that email addresses posted on Web sites or in newsgroups, [1, 9] as well as those that do not opt out of receiving marketing communications [5], attract relatively more spam. Accordingly, Internet users have been advised to disguise or conceal their email addresses to avoid them being harvested by spammers, and to opt out of receiving communications. Our second objective was to investigate what other factors influence the distribution of spam.
We found that spam was highest among Hotmail accounts, followed in decreasing order by Lycos, Excite, and Yahoo! accounts.
To jumpstart our experiment, we established email accounts at various Web-based email services for fictitious persons with various demographic characteristics (declared interest in particular products, age, and nationality). Over a period of 33 weeks, we monitored the resulting spam and analyzed the spam according to the personal characteristics.
Persons who declared interest in particular products received more spam than those who did not; those aged 30 received more spam than those aged 15; and U.S. residents received more spam than Singapore residents. Spam rates, however, did not differ across email accounts that were associated with men versus women. All of these findings support the hypothesis that spam is targeted at segments that are relatively more likely to make online purchases.1
Among the other factors that influenced spam rates, we found that spam was highest among Hotmail accounts, followed in decreasing order by Lycos, Excite, and Yahoo! accounts. Indeed, the identity of the email provider was the most important determinant of the spam rate. Consistent with previous studies, we also found that email addresses exposed through Web pages received more spam.
Our experimental procedure involved proceeding from the basis of several hypotheses, detailed as follows:
Hypothesis #1. Spam rates would be higher for persons with declared interest in some product or service than for those with no declared interest. The objective of spam is to promote sales. Hence, if spammers target their email messages, they should target the consumer segments more likely to purchase the item being promoted. However, if spam is randomly distributed, then consumers who are more likely to make online purchases should not receive any more spam than others.
We noted consumers as being relatively more likely to make online purchases in two ways. One way was for the person to explicitly state his or her interest in some product or service at the point of registration for the email account. Our other approach was to manipulate consumers’ demographic characteristics, such as age, gender, and nationality. We relied on the Pew Internet and American Life Project [8], which provides a comprehensive picture of U.S. consumer behavior online.
Hypothesis #2. Spam rates would be higher in email accounts associated with individuals aged 30 than those aged 15. Historically, the 3049 age group exhibited the highest rate of online purchases, but by December 2002, the 1829 group had caught up, and both groups exhibited the same 63% rate [8].2 The Pew Project did not even consider individuals aged below 18 in its e-commerce sample. An obvious reason is that they would not be eligible for credit cards.
Hypothesis #3. Spam rates would not differ for email accounts associated with men relative to women. The Pew Project found no significant difference in online consumer behavior by gender: “On any given day between March 2000 and December 2002, one would find roughly the same portion [sic] of men and women buying products online” [8].
Hypothesis #4. Spam rates would be higher in email accounts associated with U.S. than Singapore residents. With regard to nationality, the e-commerce participation rate is 22.7% among Singaporeans with Internet access [4] as compared with 61% among Americans [8].
Finally, to explore other factors that influence the spam rate, we considered the identity of the email service provider in addition to a known determinant: publication of the email address on a Web page.
In early August 2003, we created a total of 288 Web-based email accounts for fictitious persons at Excite, Hotmail, Lycos, and Yahoo. The persons were distinguished on the following dimensions:
- Declared interests: computers and technology, travel, casino, or none,
- Age: 15 or 30,
- Gender: female or male,
- Residence: Singapore or U.S.
We created three accounts in each unique demographic combination, or a total of 4 (interests) x 2 (age) x 2 (gender) x 2 (nationality) x 3 (accounts) = 96 email accounts at Lycos and Excite. We created only 72 accounts in Yahoo as it did not offer casino gambling on its list of interests, and created only 24 accounts in Hotmail as it did not allow for the indication of interests at the point of registration. Hence, the total number of accounts created was 288.
For 192 “exposer” accounts (two in each demographic combination), we created a Web page that included the person’s email address and other personal details at Yahoo Geocities.3 In order to maintain an appearance of activity, we regularly sent email from these accounts. For the remaining 96 “control” accounts (one in each demographic combination), we did not construct a GeoCities Web page.
When establishing the synthetic email accounts, we accepted the default type and level of anti-spam tools. Of the email service providers that we used for the experiment, all except Excite provided a basic spam guard that directed suspected spam into a bulk folder.
Over the subsequent 33-week period, we monitored the number of unsolicited commercial email messages received (“spam rate”) at each email address. The experiment concluded in March 2004.
Results and Analysis
Over the experimental period, the control accounts received an average of 4.60 (standard deviation 5.59) spam email messages, while the exposer accounts received an average of 6.84 (standard deviation 6.96). Table 1 reports descriptive statistics of the spam rates among the control and exposer accounts in the various demographic segments.
Most of the spam originated from the email service providers and their marketing collaborators. The source could be identified from statements or illustrations that marked their affiliation to the respective email service provider.
Our reported spam rates seemed low. However, they are reasonable given that the email accounts opted out of receiving special offers and other marketing communications. Further, the email accounts were not used to engage in any online transactions. If the accounts had not opted out, any subsequent commercial email would not have been “unsolicited.” In August 2001, Jamal et al. [5] registered 200 email addresses in 69 top commercial Web sites. They opted out in 100 registrations. Over the subsequent 26 weeks, the 100 opt-out addresses received an average of 5.01 spam email messages. This is strikingly similar to the spam rate in our experiment. Their other 100 addresses received an average of 151.43 spam email messages.4
Jamal’s experiment shows that email accounts that do not opt out will receive substantially more spam. Further, we conjecture that accounts used to engage in online transactions would also receive substantially more spam. These two factors probably account for most of the difference in spam rates between our synthetic accounts and those of real people.
We performed ordinary least squares regressions to test our hypotheses. For each email account, the quantity of spam was the dependent variable, and the various account characteristics were the independent variables. Table 2 reports results of the least squares regressions. In column (a), we report a regression with just a constant and three variables indicating the various email service providers in which the accounts were created. Relative to Hotmail (the default email service provider), the coefficients of the three service provider variables were all negative and significant, indicating that their accounts received less spam than those registered with Hotmail.
Column (b) included additional variables characterizing differences among the persons associated with the email accounts. The results were partly consistent with Hypothesis 1. Accounts that declared interest in travel or computing and technology received significantly more spam than those that did not declare such interest.5 However, accounts that declared interest in casino gambling did not receive significantly more spam than those that did not declare such interest. This result is consistent with Hypothesis 1 because U.S. law prohibits online gambling.
The empirical results were consistent with our second, third, and fourth hypotheses. The coefficient of AGE (0 = 15 years old; 1 = 30 years old) was positive and significant. The coefficient of GENDER (0 = female; 1 = male) was not significantly different from zero. The coefficient of NATIONALITY (0 = Singapore, 1 = U.S.) was positive and significant. We infer that Internet users aged 30 and U.S. residents received significantly more spam than those aged 15 and Singapore residents respectively, and men did not receive significantly more spam than women.
Finally, we investigated other factors that influenced spam rates. Table 2, column (c), includes an additional variable, EXPOSER (0 = no Web page; 1 = published a Web page on Yahoo GeoCities). The coefficient was positive and significant, which result is consistent with prior studies [1, 9].
Comparing Table 2, columns (a)(c), the identity of the email service provider is evidently the most important influence on the spam rate. The model in column (a) accounts for over 80% of the variance in spam rates. Each of the variables representing a particular email service provider is significant at far above the conventional levels. The other explanatory variables—declaration of interest in product or service, age, gender, nationality, and exposure on a Web page—added less than 8% additional explanatory power.
Conclusion
We found that spam is not random, but quite systematically targeted at consumer segments that are relatively more likely to make online purchases—those who declare interest in particular products or services, adults, and U.S. residents.
Our most surprising finding was that, by far, the most important influence on the spam rate was the identity of the email service provider. Specifically, Hotmail accounts received significantly more spam than accounts set up with other email service providers. This effect was more important than declaration of interest, demographic factors, and whether the email address had been published on a Web page.
We should caution that this finding arose in a context where spam was truly unsolicited and almost all the spam originated from email service providers and their marketing collaborators. Further, the email accounts we created were not used to engage in any online transactions. Subject to these provisos, our results imply that consumers should take care in choosing email service providers and declaring interests when registering for an email account.
An important direction for future research is to extend our experiments by using some of the email accounts to engage in online transactions, and specifically, make online purchases. It would be important to observe the impact of these activities on the extent of spam received.
Policymakers, Internet service providers, software vendors, and scholars all over the world are struggling to devise technological, regulatory, and social solutions to spam. Our results contribute to these efforts by providing a better understanding of the business of spam.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment