Citizens worldwide have demonstrated serious concerns regarding the management of personal information by online services. For instance, the 2015 Eurobarometer about data protection13 reveals that: 63% of citizens within the Eurpean Union (EU) do not trust online businesses, more than half do not like providing personal information in return for free services, and 53% do not like that Internet companies use their personal information in tailored advertising. Similarly, a recent survey carried out among U.S. users9 reveals that 53% of respondents were against receiving tailored ads from the information websites and apps learn about them, 42% do not think websites care about using users data securely and responsibly at all, and 73% considers websites know too much about users. A survey conducted by Internet Society (ISOC) in the Asia-Pacific region in 20168 disclosed that 59% of the respondent did not feel their privacy is sufficiently protected when using the Internet, and 45% considered getting the attention of policymakers in their country on data protection a matter or urgency.
Policymakers have reacted to this situation by passing or proposing new regulations in the area of privacy and/or data protection. For instance, in May 2018, the EU enforced the General Data Protection Regulation (GDPR)6 across all 28 member states. Similarly, in June 2018, California passed the California Consumer Privacy Act,3 which is claimed to be the nation's toughest data privacy law. In countries like Argentina or Chile, the governments proposed new bills in 2017 updating their existing data protection regulation.11 For this article, we will take as reference the GDPR since it is the one affecting more countries, citizens, and companies.
The GDPR (but also most data protection regulations) define some categories of personal data as sensitive and prohibits processing them with limited exceptions (for example, the user provides explicit consent to process that sensitive data for a specific purpose). In particular, the GDPR defines as sensitive personal data as: "data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person's sex life or sexual orientation."
In a recent work,2 we demonstrated that Facebook (FB) labels 73% of users within the EU with potentially sensitive interests (referred to as ad preferences as well), which may contravene the GDPR. FB assigns user's different ad preferences based on their online activity within this social network. Advertisers running ad campaigns can target groups of users that have been assigned a particular ad preference (for example, target FB users interested in Starbucks). Some of these ad preferences may suggest political opinions (for example, Socialist party), sexual orientation (for example, homosexuality), personal health issues (for example, breast cancer awareness), and other potentially sensitive attributes. In the vast majority of the cases, the referred sensitive ad preferences are inferred from the user behavior in FB without obtaining explicit consent from the user. Then advertisers may reach FB users based on ad preferences tightly linked to sensitive information. For instance, one of the authors of this article received the ad shown in Figure 1 (left side). The text in the ad clearly reflects the ad was targeting homosexual people. The author had not explicitly defined his sexual orientation, but he discovered that FB had assigned him the "Homosexuality" ad preference (see Figure 1 right side).
Figure 1. Snapshot of an ad received by one of the authors of this article and ad preference list showing that FB inferred this person was interested in homosexuality.
First, this article extends the scope of our analysis from the EU to 197 countries worldwide in February 2019. We quantify the portion of FB users that have been assigned ad preferences linked to potentially sensitive personal data across the referred 197 countries.
Second, we analyze whether the enactment of the GDPR on May 28, 2018 had some impact on the FB practices regarding the use of sensitive ad preferences. To this end, we compare the number of EU users labeled with potentially sensitive ad preferences in January 2018, October 2018 and February 2019 (five months before, five months after and nine months after the GDPR was enacted, respectively).
Third, we discuss privacy and ethics risks that may be derived from the exploitation of sensitive FB ad preferences. As an illustrative example, we quantify the portion of FB users labeled with the ad preference Homosexuality in countries where homosexuality is punished even with the death penalty.
Finally, we present a technical solution that allows users to remove in a simple way the sensitive interests FB has assigned them.
Advertisers configure their ad campaigns through the FB Ads Manager.a It allows advertisers to define the audience (that is, user profile) they want to target with their advertising campaigns. It can be accessed through either a dashboard or an API. The FB Ads Manager offers advertisers a wide range of configuration parameters such as (but not limited to): location (country, region, and so on), demographic parameters (gender, age, among others), behaviors (mobile device, OS and/or Web browser used, and so on), and interests (sports, food). The interest parameter is the most relevant for our work. It includes hundreds of thousands of possibilities capturing users' interest of any type. The FB Ads Manager provides detailed information about the configured audience. The most relevant element for this article is the Potential Reach that reports the number of monthly active users in FB matching the defined audience.
In parallel, FB assigns to each user a set of ad preferences, that is, a set of interests, derived from the data and activity of the user on FB. These ad preferences are indeed the interests offered to advertisers in the FB Ads Manager.b Therefore, if a user is assigned "Watches" within her list of ad preferences, she will be a potential target of any FB advertising campaign configured to reach users interested in watches. It is important to note that ad preferences in the FB ad ecosystem are available worldwide, thus there are not specific ad preferences per country.
The dataset used in this work is obtained from the data collected with our FDVT Web browser extension.1 The Data Valuation Tool for Facebook Users (FDVT) is a Web browser extension currently available for Google Chromec and Mozilla Firefox.d The FDVT main functionality is to provide users with a real-time estimation of the revenue they generate for FB out of the ads they receive in FB. To compute that estimation we obtain from the FB API the price advertisers are willing to pay to display ads and gather clicks from users with the same profile as the FDVT user and quantify the number of ads the FDVT user receives and clicks during a Facebook session. The FDVT collects (among other data) the ad preferences FB assigns to the user by parsing the user's ad preferences' pagee where any user can find her ad preferences' list. It is important to note that all FDVT users granted us explicit permission to use the collected information (in an anonymous manner) for research purposes. We leverage this information to identify potentially sensitive ad preferences assigned to users that have installed the FDVT.
Finally, for any ad preference, we can query the FB Ads Manager API to retrieve the Potential Reach (that is, FB active users) associated with any FB audience. Hence, we can obtain the number of FB users in any country (or group of countries) that have been assigned a particular interest (or group of interests).
We seek to quantify the number of FB users that have been assigned potentially sensitive ad preferences across 197 countries in February 2019. To this end, we follow a two-step process.
First, we identify likely sensitive ad preferences within five of the relevant categories listed as Sensitive Personal Data by the GDPR: racial or ethnic origin, political opinions, religious or philosophical beliefs, health, and sexual orientation. This article reuses the list of 2092 potentially sensitive ad preferences we obtained in Cabañas et al.2 out of analyzing more than 126k unique ad preferences assigned 5.5M times to more than 4.5k FDVT users.
To extract that list we first implemented an automatic process to reduce the list of 126k ad preferences to 4,452 likely sensitive ad preferences. Next, we recruited a group of 12 panelists who manually classified the 4,452 ad preferences into sensitive, in case they could be assigned to some of the five sensitive categories referred above, or non-sensitive. All of the panelists are researchers (faculty or Ph.D. students) with some knowledge in the area of privacy. Each ad preference received five votes, and we used majority voting10 to classify each ad preference either as sensitive or non-sensitive. Overall, 2092f out of the 4,452 ad preferences were labeled as sensitive. We referred to this subset of 2,092 ad preferences as the suspected sensitive subset. We collected this set in January 2018 and checked that 2,067 out of these 2,092 potentially sensitive ad preferences were still available within the FB Ads Manager in February 2019.
Second, we leveraged the FB Ads Manager API to retrieve the portion of FB users in each country that had been assigned at least one of the Top N (with N ranging between 1 and 2,067) potentially sensitive ad preferences from the suspected sensitive subset. In particular, we retrieve how many users in a given country are interested in ad preference 1 OR ad preference 2 OR ad preference 3… OR ad preference N. An example of this for N = 3 could be "How many people in France are interested in Pregnancy OR Homosexuality OR Veganism." We have defined the following metric that we use in the rest of the article.
-FFB(C,N). Percentage of FB users in country C that have been assigned at least one of the top N potentially sensitive ad preferences from the suspected sensitive subset. We note C may also refer to all the countries forming a particular region (for example, EU, Asia-Pacific, U.S., among others). FFB(C,N) is computed as the ratio between the number of FB users that have been assigned at least one of the top N potentially sensitive ad preferences and the total number of FB users in country C. Finally, it is important to note that the FB Ads Manager API only allows creating audiences with at most N = 1,000 interests. Therefore, in practice, the maximum value of N we can use to compute FFB is 1,000.
Exposure of FB Users to Potentially Sensitive Ad Preferences
We have computed the portion of FB users that have been assigned some of the 2,067 potentially sensitive ad preferences within 197 different countries. Figure 2 shows a choropleth map of FFB(C,1000) for those countries in February 2019.
Figure 2. Choropleth map of the number of FB users assigned potentially sensitive ad preferences (FFB(C,1000)) for the 197 countries analyzed in the article.
If we consider the 197 altogether, 67% of FB users are tagged with some potentially sensitive ad preference. This portion of users corresponds to 22% of citizens across the 197 analyzed countries according to the population data reported by the World Bank.g However, FFB shows an important variation across countries.
We find the most impacted country is Malta where 82% of FB users are assigned some potentially sensitive ad preference. Contrary, the least impacted country is Equatorial Guinea where 37% of FB users are assigned potentially sensitive ad preferences.
More interesting, an overview of the map seems to suggest that western countries have a higher exposure to potentially sensitive ad preferences compared to Asian and African countries. To quantify these effects we have computed the Pearson correlation of the FFB metric with the following socio-economic indicators: FB penetration; expected years of school; access to a mobile phone or Internet at home; GDP per capita; voice and accountability; and birth rate. Note that Western developed countries show higher values in all the indicators but birth rate. Hence, we hypothesize that we will find a positive correlation between FFB and all the indicators but birth rate. Table 1 shows the results of the referred correlations.
Table 1. Pearson correlation and p_value between FFB and six socioeconomic development indicators of the country.
The results corroborate our hypothesis since all the indicators but birth rate are positively correlated with FFB. In summary, the results validate our initial observation that FB users in western developed countries are more exposed to be labeled with sensitive ad preferences than users in Africa and Asia. It is interesting to observe that in the case of South-America we observe a similar pattern in which the most powerful economies and developed countries such as Brazil, Chile, and Argentina show higher exposure to sensitive ad preferences than other countries in South-America.
Exposure of FB Users to Very Sensitive Ad Preferences
Although legislation tries to define what sensitive data is, some people might think that not all different sensitive data items are equally sensitive. For instance, data revealing sexual orientation from somebody could be considered more sensitive than, for example, data showing that one user may be affected by flu. Therefore, the level of sensitivity of our list of interests is very likely subjective and will depend on each person personal perception.
Here, we zoom in our analysis to a narrowed list of interests that match undoubtedly with the definition of the GDPR for the case of sensitive personal data. We examined a subset of 15 ad preferences not compliant with the GDPR definition of sensitive personal data. We supported our statement asking for validation by an expert from the Spanish Data Protection Authority (DPA). This expert, with both a very deep knowledge of the GDPR and a technical background that allow him perfectly understanding the FB advertisement ecosystem, verified that in his opinion these 15 ad preferences do not comply with the GDPR.
We retrieve the portion of FB users assigned in each of the 197 countries analyzed that have been assigned each of the 15 expert verified ad preferences and the aggregation of them. Since it is unfeasible to show the results for each of the countries within the paper, we have grouped them into five continents: Africa, America, Asia, Europe, and Oceania. To obtain the desegregated results for each country we refer the reader to the following external link.h
Table 2 shows FFB for each of the expert-verified sensitive ad preferences within the five continents. Besides, the last row referred to as Union shows the aggregated results considering all the 15 interests within a group, while the last column World depicts the overall results considering all 197 countries. The results show that when considering all the 197 countries 33% of FB users, which corresponds to almost 11% of citizens within those countries, have been labeled with some of the 15 sensitive interests in the table. As it was expected from the correlation results depicted in the previous section, Asia and Africa are showing the lowest values of FFB (27.62% and 30.43%, respectively). The exposition of FB users grows up to 38.25.
Table 2. Percentage of FB users (FFB) within Africa, America, Asia, Europe, and Oceania assigned some sensitive ad preferences from a list of 15 expert-verified sensitive ad preferences as non-GDPR compliant. Last column "World" shows FFB for the aggregation of all 197 considered countries. Last row shows the result for the 15 ad preferences aggregated.
If we look in detail some of the ad preferences in the table, we observe that the portion of users worldwide labeled with the ad preference homosexuality is almost 5%. This number doubles for the ad preference bible (intimate related to one particular religious belief), and grows up to almost 15% for pregnancy.
Comparison of EU FB Users Exposure to Potentially Sensitive Ad Preferences Before and After GDPR Enforcement
This section aims to analyze whether the GDPR enforcement had some effect on minimizing the use of potentially sensitive ad preferences in the EU. To that end we compare the exposure of EU users to potentially sensitive ad preferences in January 20182 (five months before the GDPR was enforced) to the exposure measured in October 2018 and February 2019 (five and nine months after the GDPR was enforced, respectively).
The first relevant change is that FB had removed 19 ad preferences in October 2018 and 25 in February 2019 from the set of 2,092 potentially sensitive ad preferences we retrieved on January 2018. Although this is a negligible amount, it is worth noting that five of the removed ad preferences are: Communism, Islam, Quran, Socialism, and Christianity. These five ad preferences were included in an initial set of 20 ad preferences verified by the DPA expert as very sensitive. Although we observe the removal of these five elements happened around the GDPR enforcement (between January 2018 and October 2018) we do not know whether the actual reason why FB deleted those ad preferences was a reaction to the GDPR or there was a different motivation.
Figure 3 shows the FFB difference in percentage points between the results obtained in January 2018 and October 2018 (grey bar); and between January 2018 and February 2019 (black bar) across the 28 EU countries, and the EU aggregated labeled as EU28.
Figure 3. Variation of FFB in percentage points for each EU country between: the data obtained in January 2018 and October 2018 (five months before and five months after the GDPR was enacted) represented by the grey bar; the data obtained in January 2018 and February 2019 (five months before and nine months after the GDPR was enacted) represented by the black bar. The last label (EU28) represents the results for all EU countries together.
If we consider the results of October 2018, we observe that the portion of users labeled with potentially sensitive ad preferences was lower in all EU countries but Spain after the GDPR enforcement (that is, compared to the data obtained in January 2018). However, the aggregated EU reduction is rather small, only three percentage points. The largest reduction is 7.33 percentage points in the case of Finland.
The slight reduction observed in the results obtained in October 2018 seems to disappear when we observe the results from February 2019. There are 13 countries where the portion of users labeled with potentially sensitive data is higher in February 2019 as compared to January 2018. Overall, the aggregated results show that the portion of users labeled with potentially sensitive ad preferences in February 2019 is only 1% less than in January 2018.
In summary, the overall impact of the GPDR to prevent FB of using potentially sensitive ad preferences for advertising purposes is negligible.
Ethics and Privacy Risks Associated with Sensitive Personal Data Exploitation
The possibility of reaching users labeled with potentially sensitive personal data enables the use of FB ad campaigns to attack (for example, hate speech) specific groups of people based on sensitive personal data (ethnicity, sexual orientation, religious beliefs, and so on). Even worse, in Cabañas,2 we performed a ball-park estimation showing that in average an attacker could retrieve personal identifiable information (PII) of users tagged with some sensitive ad preference through a phishing-like attack7 at a cheap cost ranging between €0.015 and €1.5 per user, depending on the success ratio of the attack. Following, we describe other potential risks associated with sensitive ad preferences.
Recently, a journalist of the Washington Post wrote an article to denounce her own experience after she became pregnant.i It seems FB algorithms inferred that situation out of some actions she performed while browsing in FB. Probably FB labeled her with the ad preference "pregnancy" or some other similar and she started to receive pregnancy-related ads. Unfortunately, the journalist had a stillbirth but she kept receiving ads related to pregnancy, which exposed here to a very uncomfortable experience.
Another serious risk, which in our opinion is extremely worrying, is linked to the fact that many FB users are tagged with the interest "Homosexuality" in countries where being homosexual is illegal and may even be punished with the death penalty. There are still 78 countries in the world where homosexuality is penalizedj and a few of them where the maximum punishment is the death penalty. Table 3 shows the FFB metric results only considering the interest "Homosexuality" in countries that penalize homosexuality with the death penalty. For instance, in the case of Saudi Arabia, we found that FB assigns the ad preference "Homosexuality" to 540K users (2.08% of all FB users in that country). In the case of Nigeria 620k (2.35% of all FB users in that country).
Table 3. Percentage of FB users (FFB) tagged with the interest "Homosexuality" in countries where being homosexual may lead to death penalty. Note we do not include Iran and Sudan since FB is not providing information for those countries.
We acknowledge the debate regarding what is sensitive and what is not is a complex one. However, we believe FB should take immediate actions to avoid worrying and painful situations like the ones exposed in this section, in which FB may unintentionally expose users to serious risks. The most efficient and privacy-preserving solution would be implementing an opt-in process in which users have to proactively accept receiving targeted ads. That solution would empower the users to avoid companies like FB to process personal data (including sensitive one) for advertising purposes, and, therefore, would alleviate the potential privacy risks associated to the use of sensitive ad preferences for users that do not opt-in. However, that is unlikely to happen in the short-term. Meanwhile, a straightforward action should be stopping using the ad preference "Homosexuality" (or similar ones) in countries where being homosexual is illegal, and other very sensitive ad preferences like the 15 ones we list in this article.
FDVT Extension to Allow Users Removing Potentially Sensitive Ad Preferences
The results reported previously motivate the development of solutions that make users aware of the use of sensitive personal data for advertising purposes. In addition, it is also important to empower them to remove in a very simple manner those sensitive ad preferences they do not fill comfortable with. Unfortunately, the existing process FB offers is unknown and complex for most users. To this end, we have extended the FDVT browser extension to inform users about the potentially sensitive ad preferences that FB has assigned them, both the active ones but also those assigned in the past that are not currently active; or allow users to remove with a single click either all the active sensitive ad preferences or those individual ones users do not fill comfortable with.
We have introduced a new button in the FDVT extension interface with the label "Sensitive FB Preferences." When a user clicks on that button, we display a page listing at the top the potentially sensitive ad preferences included in the user's ad preference set (both the active ones and inactive ones). Figure 4 shows an example of this page. We provide the following information for each ad preference: Ad preference name; Topic; and, Sensitive, whether the ad preference is potentially sensitive (highlighted in yellow) or not. Besides, next to each ad preference there is a button Delete Ad Preference to individually remove those ad preferences. Moreover, we provide another button More Info to individually display the historical information for the ad preference, which includes the period(s) when the ad preference has been active and the reason why FB has assigned that ad preference to the user. Finally, at the top of the page we include a search bar to look for specific preferences and two buttons: Delete All Sensitive Ad Preferences and Delete All Ad Preferences to remove all currently active potentially sensitive ad preferences and all currently active, respectively.
Figure 4. Snapshot of FDVT new feature to allow users deleting sensitive ad preferences.
We published a prior article2 in which we already analyzed the use of sensitive information on FB. That article focuses on the European Union a few months before the GDPR was enacted. The research community asked us in various forums that it would be interesting to further extend our analysis to cover the use of sensitive information on FB worldwide and not just in the EU, and to understand the potential impact that the GDPR could have on reducing the exposure of users to sensitive ad preferences. This article covers both requests and, in addition, it adds two more contributions: We present two clear scenarios in which the use of sensitive ad preferences could have serious consequences for the users; and we introduce an improvement of the FDVT that allows users to remove in a simple way potentially sensitive ad preferences they do not like.
Few previous works in the literature address issues associated with sensitive personal data in online advertising, as well as some recent works that analyze privacy and discrimination issues related to FB advertising and ad preferences.
Carrascosa et al.4 propose a new methodology to quantify the portion of targeted ads received by Internet users while they browse the web. They create bots, referred to as personas, with very specific interest profiles (for example, persona interested in cars) and measure how many of the received ads match the specific interest of the analyzed persona. They create personas based on sensitive personal data (health) and demonstrate that they are also targeted with ads related to the sensitive information used to create the persona's profile.
Castellucia et al.5 show that an attacker that gets access (for example, through a public WiFi network) to the Google ads received by a user could create an interest' profile that could reveal up to 58% of the actual interests of the user. The authors state that if some of the unveiled interests are sensitive, it could imply serious privacy risks for users.
Facebook offers advertisers the option to commercially exploit potentially sensitive information to perform tailored ad campaigns. This practice lays, in the best case, within a gray legal area according to the recently enforced GDPR. Our results reveal that 67% of FB users (22% of citizens) worldwide are labeled with some potentially sensitive ad preference. Interestingly, users in rich developed countries present a significantly higher exposure to be assigned sensitive ad preferences. Our work also reveals that the enforcement of the GDPR had a negligible impact on FB regarding the use of sensitive ad preferences within the EU. We believe it is urgent that stakeholders within the online advertising ecosystem (that is, advertisers, ad networks, publishers, policymakers, and so on) define an unambiguous list of personal data items that should not be used anymore to protect users from potential privacy risks as those described in this article.
Acknowledgments. The research leading to these results has received funding from: the European Union's Horizon 2020 innovation action programme under grant agreement No 786741 (SMOOTH project) and the gran agreement No 871370 (PIMCITY project); the Ministerio de Economía, Industria y Competitividad, Spain, and the European Social Fund (EU), under the Ramón y Cajal programme (Grant RyC-2015-17732), and the Project TEXEO (Grant TEC2016-80339-R); the Ministerio de Educación, Cultura y Deporte, Spain, through the FPU programme (Grant FPU16/05852); the Community of Madrid synergy project EMPATIA-CM (Grant Y2018/TCS-5046); and the Fundación BBVA under the project AERIS.
1. Cabañas, J., Cuevas, A., and Cuevas, R. FDVT: Data valuation tool for Facebook users. In Proceedings of the 2017 CHI Conf. Human Factors in Computing Systems. ACM, New York, NY, USA, 3799–3809; https://doi.org/10.1145/3025453.3025903
4. Carrascosa, J., Mikians, J., Cuevas, R., Erramilli, V., and Laoutaris, N. I always feel like somebody's watching me: Measuring online behavioural advertising. In Proceedings of the 11th ACM Conf. Emerging Networking Experiments and Technologies. ACM, New York, NY, Article 13; https://doi.org/10.1145/2716281.2836098
5. Castelluccia, C., Kaafar, M., and Tran, M. Betrayed by your ads!. In Intern. Privacy Enhancing Technologies Symp. Springer Berlin Heidelberg, 2012, 1–17.
6. European Union. Regulation 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC General Data Protection Regulation; http://eur-lex.europa.eu/eli/reg/2016/679/oj.
12. Speicher, T. et al. Potential for discrimination in online targeted advertising. In Proceedings of the 1st Conf. Fairness, Accountability and Transparency. S A. Friedler and C. Wilson (Eds.). PMLR 81, 2018, New York, NY, USA, 5–19.
14. Venkatadri, G. Privacy risks with Facebook's PII-based targeting: Auditing a data broker's advertising interface. In Proceedings of the IEEE Symp. Security and Privacy. (San Francisco, CA, USA, 2018) IEEE, 89–107.
José González Cabañas (firstname.lastname@example.org) is a Ph.D. candidate and FPU scholarship holder in the Department of Telematic Engineering at the Universidad Carlos III de Madrid, Spain.
Ángel Cuevas (email@example.com) is a Ramón y Cajal Fellow in the Department of Telematic Engineering at Universidad Carlos III de Madrid, Spain, and Fellow at UC3M-Santander Big Data Institute, Spain.
Aritz Arrate (firstname.lastname@example.org) is a Ph.D. candidate in the Department of Telematic Engineering at the Universidad Carlos III de Madrid, Spain.
Rubén Cuevas (email@example.com) is an associate professor in the Department of Telematic Engineering at the Universidad Carlos III de Madrid, Spain and Deputy Director and Fellow at UC3M-Santander Big Data Institute, Spain.
f. https://fdvt.org/usenix2018/panelists.html. This resource includes the list of all potentially sensitive ad preferences manually labeled by the panelists along with the 5 votes each of them received from the panelists.